Methods of generating variant proteins with increased host string content and compositions thereof

ABSTRACT

The present invention relates to novel methods for generating variant proteins with increased host string content, and proteins that are engineered using these methods.

This application is a Continuation of U.S. application Ser. No.11/981,794, filed Oct. 31, 2007; U.S. application Ser. No. 11/981,794 isa continuation of U.S. application Ser. No. 11/004,590, filed Dec. 3,2004, now U.S. Pat. No. 7,657,380, issued Feb. 2, 2010, and U.S.application Ser. No. 11/004,590 claims the benefit of under 35 U.S.C.§119(e) to U.S. Ser. Nos. 60/527,167, filed Dec. 4, 2003; 60/581,613,filed Jun. 21, 2004; 60/601,665, filed Aug. 13, 2004; and, 60/619,483,filed Oct. 16, 2004; all of which are expressly incorporated byreference in their entirety.

FIELD OF THE INVENTION

The present invention relates to novel methods for generating variantproteins with increased host string content, and proteins that areengineered using these methods.

BACKGROUND OF THE INVENTION

Many proteins that have the potential to be useful human therapeuticshave a xenogeneic origin. The use of xenogeneic proteins for therapeuticpurposes may be advantageous for a variety of reasons, including, forexample, the established success of hybridoma technology for raisingantibodies in rodents, and the possibility of higher efficacy with axenogeneic protein than with a human counterpart. Although xenogeneicproteins are a rich source of potential therapeutic molecules, theyremain a relatively untapped one. One reason for this is that nonhumanproteins are often immunogenic when administered to humans, therebygreatly reducing their therapeutic utility. Additionally, evenengineered proteins of human origin may become immunogenic due tochanges in the protein sequence.

Immunogenicity is the result of a complex series of responses to asubstance that is perceived as foreign, and may include production ofneutralizing and non-neutralizing antibodies, formation of immunecomplexes, complement activation, mast cell activation, inflammation,hypersensitivity responses, and anaphylaxis. Several factors cancontribute to protein immunogenicity, including but not limited toprotein sequence, route and frequency of administration, and patientpopulation. Immunogenicity may limit the efficacy and safety of aprotein therapeutic in multiple ways. Efficacy can be reduced directlyby the formation of neutralizing antibodies. Efficacy may also bereduced indirectly, as binding to either neutralizing ornon-neutralizing antibodies typically leads to rapid clearance fromserum. Severe side effects and even death may occur when an immunereaction is raised. One special class of side effects results whenneutralizing antibodies cross-react with an endogenous protein and blockits function.

Because of the clinical success of monoclonal antibodies, immunogenicityreduction of these proteins has been an intense area of investigation.Antibodies are a unique system for the development of immunogenicityreduction methods because of the large number of highly conservedantibody sequences and the wealth of high-resolution structuralinformation. A number of strategies for reducing antibody immunogenicityhave been developed. The central aim of all of these approaches has beenthe reduction of nonhuman, and correspondingly immunogenic content,while maintaining affinity for the antigen.

The dominant method in use for antibody immunogenicity reduction,referred to as “humanization”, relies principally on the grafting of“donor” (typically mouse or rat) complementarity determining regions(CDRs) onto “acceptor” (human) variable light chain (VL) and variableheavy chain (VH) frameworks (FRs) (Tsurushita & Vasquez, 2004,Humanization of Monoclonal Antibodies, Molecular Biology of B Cells,533-545, Elsevier Science (USA)). This strategy is referred to as “CDRgrafting” (Winter U.S. Pat. No. 5,225,539). “Backmutation” of selectedacceptor framework residues to the corresponding donor residues is oftenrequired to regain affinity that is lost in the initial graftedconstruct (U.S. Pat. No. 5,530,101; U.S. Pat. No. 5,585,089; U.S. Pat.No. 5,693,761; U.S. Pat. No. 5,693,762; U.S. Pat. No. 6,180,370; U.S.Pat. No. 5,859,205; U.S. Pat. No. 5,821,337; U.S. Pat. No. 6,054,297;U.S. Pat. No. 6,407,213). Despite the significant clinical applicationof antibodies engineered using these methods, these methods remainnonrobust with regard to their ability to reduce immunogenicity. Anumber of humanized antibodies have elicited substantial immune reactionin clinical studies, with incidences of immune response as high as 63%of patients (Ritter et al., 2001, Cancer Research 61: 6851-6859).

The incomplete capacity of current humanization methods forimmunogenicity reduction are due to significant limitations imposed bythe donor-acceptor approach. Historically, the use of a single donor hasbeen part of methods aimed at engineering a single xenogeneic antibodyto be suitable as a human biotherapeutic. However, the use of a singleacceptor is not required. On the contrary, the use of an acceptorantibody, and the use of global homology to select it, place substantialrestrictions on the immunogenicity reduction process. A principalproblem is that the use of overall sequence similarity between nonhumanand human sequences as a metric for human immunogenicity isfundamentally flawed. This means of measuring the degree of humannessdoes not accurately account for the underlying molecular mechanisms ofimmune response. The immune system does not recognize antigens on thebasis of global sequence similarity to human proteins. Rather, immunecells, including antigen presenting cells (APCs), T cells, and B cells,recognize linear or conformational motifs comprising only a handful ofresidues. A key step in antigen recognition is the formation ofpeptide-MHC-T cell receptor complexes. APCs express MHC molecules thatrecognize short (approximately nine residue) linear peptide sequences,referred to as MHC agretopes. T cells express T cell receptors thatrecognize T cell epitopes in the context of peptide-MHC complexes. Tcells that recognize MHC agretopes that are present in human proteinstypically undergo apoptosis or become anergic, while T cells thatrecognize foreign agretopes bound to MHC molecules may participate in animmune response. Thus the relevant quantity for the immunogenicity of aprotein is not its global sequence similarity to a human sequence, butrather its sequence content of individual human epitopes.

The donor-acceptor model and the use of global sequence homology that itimposes fails in practice. Because CDRs are treated as inviolable,structural incompatibilities are introduced at the CDR-FR boundaries.Grafting of foreign donor CDRs onto a human acceptor framework creates asubstantial number of nonhuman epitopes in each variable chain,including not only the epitopes in the foreign CDRs, but also the largenumber of epitopes at the FR-CDR boundaries. This FR-CDR incompatibilityis evident when one backs away from global homology and looks at morelocal sequence homologies. CDR grafting generally maximizes thedonor-acceptor homology of the frameworks at the expense of the CDRs(Clark, 2000, Immunology Today 21: 397-402). Ironically this frequentlyresults in lower global homology to human antibodies. In reality, the“cut and paste” approach to imparting the functional determinants of anonhuman antibody onto the framework of a human one is unnecessary, ascareful analysis of the antigen binding determinants of antibodies showsthat, in fact, the majority of CDR residues are not involved in bindingantigen (MacCallum et al., 1996, J. Mol. Biol. 262: 732-745). FR-CDRincompatibility causes not only immunological problems at the sequencelevel, but also causes conformational problems at the structural level.As a result, humanization methods based on CDR grafting often result inantigen affinity losses of 10-100-fold, necessitating backmutation todonor residues within the framework. This process of backmutation is ahallmark of essentially all current humanization efforts, and because itintroduces yet additional nonhuman epitopes, highlights the inefficiencyof these methods.

Methods that take an immune epitope approach to reducing antibodyimmunogenicity have been explored (U.S. Pat. No. 5,712,120; US2003/0153043). Central to these methods is the determination ofsequences within a xenogeneic antibody that are in fact immunogenicepitopes. Different methods for determination of immunogenicity boththeoretical and experimental have been described and includedetermination of potential for amphipathic helix formation, binding toMHC, reactivity in a T-cell activation assay. A distinguishing featurebetween these strategies and the present invention is that the presentinvention makes no presumption as to the immunogenicity of specificepitopes. Rather, the primary goal is to maximize the content of humanlinear sequence strings in the xenogeneic antibody as determined bycomparison to an alignment of human sequences. The relevant sequencedataset comprises strings that are nonimmunogenic for all relevantreasons, including lack of interaction with MHC, lack of interactionwith T cell receptor, lack of proper processing necessary forpresentation, and tolerance.

It is noted that the methods described in U.S. Pat. No. 5,712,120 and US2003/0153043 suffer additionally in that they fail to address asignificant concern for local level sequence engineering, namely therequirement for maintaining protein structure, stability, solubility,and function. Thus, although the sequence string approach toimmunogenicity reduction is more accurate than CDR grafting, it will beoptimal when coupled with protein design methodology that takes intoaccount both local sequence content and conformational compatibility atthe local and global structural level. In addition to providing scoringfunctions for assessing host string content, the present invention alsodescribes scoring functions that evaluate other relevant properties of aprotein that may be employed for the simultaneous immunogenicityreduction and structural and functional optimization of proteins.

In summary, the donor-acceptor model imposes significant restrictions onthe immunogenicity reduction process. With regard to sequence, globalsequence homology is an inappropriate metric for immunogenicity. Withregard to structure, backmutations are needed to repair conformationalincompatibilities, thereby creating or reintroducing nonhuman epitopes.The present invention describes a novel method for antibodyimmunogenicity reduction that steps outside of the donor-acceptor model,and thus the sequence and structural restrictions it imposes. Thecentral strategy of the described method is that it maximizes thecontent of human linear sequence strings. In this way immunogenicity isaddressed at the local sequence level, typically by utilizing the localsequence information contained in an alignment of human sequences. Thisstrategy not only provides a more accurate measure of theimmunogenicity, it enables substitutions to be designed in a forwardrather than backward manner to repair problems introduced by the graft.In effect, by addressing immunogenicity at the local sequence stringlevel, the optimal balance between binding determinants and humannesscan be designed.

The present invention describes a novel method for reducing theimmunogenicity of proteins that leverages the nonimmunogenic informationcontained in natural human sequences to score protein sequences forimmunogenic content at the sequence string level. Furthermore, thedescribed method capitalizes on recent advances in computationalsequence and structure-based protein engineering methods toquantitatively and systematically determine the optimal balance betweenhuman sequence content and protein functionality. Because of the wealthof human sequence information available for the immunoglobulin proteinfamily, application to human antibodies is emphasized. Applications toother proteins are also possible.

SUMMARY OF THE INVENTION

The invention disclosed herein provides a novel method for reducing theimmunogenicity of a protein, wherein the method maximizes the content ofsequence strings. In a preferred embodiment, the method of the presentinvention maximizes the content of human sequence strings.

It is an object of the present invention to provide scoring functionsthat may be used to evaluate the human sequence string content of aprotein. In a preferred embodiment, the scoring function compares thesimilarity of strings in a protein sequence to the strings that composea set of natural protein sequences. In another preferred embodiment, theset of sequences is an aligned set of germline sequences. In additionalpreferred embodiments, the set of sequences contains mature sequences.In the most preferred embodiments, the sequences are human sequences.

It is an object of the present invention to provide scoring functionsthat may be used to evaluate the structural and/or functional fitness ofa protein.

It is an object of the present invention to provide protein variants ofa parent protein that are engineered using the methods described herein.In a preferred embodiment, the parent protein is an immunoglobulin.

It is an object of the present invention to provide experimental methodsfor screening and testing the protein variants of the present invention.

The present invention provides isolated nucleic acids encoding theprotein variants described herein. The present invention providesvectors comprising the nucleic acids, optionally, operably linked tocontrol sequences. The present invention provides host cells containingthe vectors, and methods for producing and optionally recovering theprotein variants.

The present invention provides compositions comprising the proteinvariants described herein, and a physiologically or pharmaceuticallyacceptable carrier or diluent.

The present invention provides novel antibodies and Fc fusions thatcomprise the protein variants disclosed herein. The novel antibodies andFc fusions may find use in a therapeutic product.

The present invention provides therapeutic treatment and diagnostic usesfor the protein variants disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Human germ line sequences and diversity. The sequences that areknown to encode the human VH chains (FIG. 1 a) (SEQ ID NOS:1-53), humanVL kappa chains (FIG. 1 b) (SEQ ID NOS:54-98), and VH and VL kappa Jchains (FIG. 1 c) (SEQ ID NOS:99-109) are shown. The VL lambda germlinesequences are not provided. The germline sequences are numberedaccording to the numbering scheme of Kabat (Kabat et al., 1991,Sequences of Proteins of Immunological Interest, 5th Ed., United StatesPublic Health Service, National Institutes of Health, Bethesda). Theregions of the variable region are indicated above the numbering inFIGS. 1 a and 1 b, and these include framework regions 1 through 3 andthe CDRs 1 through 3. Positions that make up the Kabat CDRs areunderlined. The germline chains are grouped into 7 subfamilies for V_(H)and and 6 subfamilies for V_(L), as is known in the art, and thesesubfamilies are grouped together and separated by a blank line. Thesequences of the five germlines that make up the IgG light kappa Jchains (IGKJ1-IGKJ5), and the six germlines that make up the IgG heavy Jchains (IGHJ1-IGHJ5) are shown in FIG. 1 c. The kappa and lambda light Jchains combine with the VLκ and VLλ germlines respectively to form thelight chain variable region, and the heavy J chains combine with the VHgermlines and heavy diversity (D) germlines (not shown) to form theheavy chain variable region. The V_(H) CDR3 is not part of the V_(H)germ line, and is encoded by the D and J genes.

FIG. 2. The quantities described by equations 1, 2, and 3 areillustrated. In FIG. 2 a, IDstring (Equation 1) is illustrated for thestring beginning at position i=15, comparing a region of the murineantibody m4D5 VH sequence (VH_m4D5) (SEQ ID NO:110) as parent sequenceswith the homologous region from the VH human germline sequence(VH_(—)1-2) (SEQ ID NOS:111) as human sequence h. Only 30 residues fromeach sequence are shown, and the residues that compose the relevantstring are bolded. In FIG. 2 b, IDmax (Equation 2) is illustrated forthe parent sequence s string that begins at position i=15 (shown inbold) and the homologous regions from an aligned set of 7 VH humangermline sequences (SEQ ID NOS:111-117). In FIG. 2 c, HSC(s) (Equation3) is illustrated for all strings (i=1 to i=22) in the parent sequence sand the homologous regions from an aligned set of 7 VH human germlinesequences (SEQ ID NOS:111-117).

FIG. 3. Sequence, host string content, and structure of WT AC10 VL. FIG.3 a shows the sequence of the WT AC10 VL (SEQ ID NO:117). FIG. 3 b showsthe identity of each residue in WT AC10 VL as compared to thecorresponding residue in each sequence of the human VLκ germline. Theblack horizontal lines delineate the 7 different subfamilies aspresented in FIG. 1, and the black vertical lines delineate thedifferent framework and CDR regions of the domain (in the orderFR1-CDR1-FR2-CDR2-FR3-CDR3-FR4). A grey square indicates that thegermline sequence has the same amino acid identity to the residue at thecorresponding position in the WT AC10 VL sequence. A white squareindicates that the two sequences differ at that position. FIG. 3 c showsthe continuous 8- and 9-mer strings between WT AC10 VL and each sequenceof the human VLκ germline. The black horizontal and vertical lines areas described in FIG. 3 b. A grey square indicates that the germlinesequence comprises an 9-mer string centered on that position that is an8 out of 9 or 9 out of 9 identical match to the corresponding string(centered on the corresponding residue) in the WT AC10 VL sequence. FIG.3 d shows the structure of the modeled WT AC10 variable region. Thelight chain is shown as grey ribbon, the heavy chain is shown as blackribbon, and the CDR residues are indicated as black lines.

FIG. 4. Sequence (SEQ ID NO:119), host string content, and structure ofWT AC10 VH. The figure is as described in the figure legend for FIG. 3,except that here the light chain is shown as black ribbon and the heavychain is shown as grey ribbon.

FIG. 5. Sequence (SEQ ID NO:120), host string content, and structure ofCDR grafted AC10 VL. CDR grafted AC10 VL was derived from the CDRs of WTAC10 and the frameworks of the human germline sequence vlk_(—)4-1.Differences between CDR grafted AC10 VL and WT AC10 VL are shown asbolded residues in the sequence in FIG. 5 a, and as black ball andsticks in FIG. 5 d.

FIG. 6. Sequence (SEQ ID NO:121), host string content, and structure ofCDR grafted AC10 VH. CDR grafted AC10 VH was derived from the CDRs of WTAC10 and the frameworks of the human germline sequence vh_(—)1-3 andsubstitutions Q108L and A113S (Kabat numbering) in FR4

FIG. 7. AC10 VL (SEQ ID NO:118) and VH variants with optimized HSC (SEQID NOS:122-160). AC10 VL (SEQ ID NO:119) and VH variants with optimizedHSC (SEQ ID NOS:161-217). The nonredundant set of output sequences fromthe calculations described in Example 1 are shown. For each iteration(Iter) the following are provided: the Structural Consensus; StructuralPrecedence; Human String Content (HSC); Human String Similarity (HSS);N₉max; the Framework Region Homogeneity (FRH); and, the number ofmutations from WT (Muts). The output sequences were clustered based ontheir mutational distance from the other sequences in the set. Theseclusters are delineated by the horizontal black lines. The “Cluster”column provides this mutational distance quantitatively. Differencesbetween the parent WT AC10 sequence are shown in grey. Positions arenumbered according to the Kabat numbering scheme, provided at the top.The light grey regions bracketed by the black horizontal lines indicateresidues in or proximal to the Kabat defined CDRs.

FIG. 8. Sequence (SEQ ID NO:218), host string content, and structure ofL1 AC10 VL.

FIG. 9. Sequence (SEQ ID NO:219), host string content, and structure ofL2 AC10 VL

FIG. 10. Sequence (SEQ ID NO:220), host string content, and structure ofL3 AC10 VL.

FIG. 11. Sequence (SEQ ID NO:221), host string content, and structure ofH1 AC10 VH.

FIG. 12. Sequence (SEQ ID NO:222), host string content, and structure ofH2 AC10 VH.

FIG. 13. Sequence (SEQ ID NO:223), host string content, and structure ofH3 AC10 VH.

FIGS. 14. AlphaScreen™ assay measuring binding between AC10 variants andthe target antigen CD30. In the presence of competitor variant antibody,a characteristic inhibition curve is observed as a decrease inluminescence signal. The binding data were normalized to the maximum andminimum luminescence signal for each particular curve, provided by thebaselines at low and high antibody concentrations respectively. Thecurves represent the fits of the data to a one site competition modelusing nonlinear regression, and the fits provide IC50s for eachantibody.

FIG. 15. FIG. 11. SPR sensorgrams showing binding of AC10 WT and variantfull length antibodies to the CD30 target antigen. The curves consist ofan association phase and dissociation phase, the separation being markedby a little spike on each curve.

FIG. 16. AlphaScreen™ assay measuring binding between AC10 variants andhuman V158 FcγRIIIa.

FIG. 17. Cell-based ADCC assay of WT and AC10 variants. Purified humanperipheral blood monocytes (PBMCs) were used as effector cells, L540Hodgkin's lymphoma cells were used as target cells, and lysis wasmonitored by measuring LDH activity using the Cytotoxicity Detection Kit(LDH, Roche Diagnostic Corporation, Indianapolis, Ind.). Samples wererun in triplicate to provide error estimates (n=3, +/−S.D.). FIG. 17shows the dose dependence of ADCC at various antibody concentrations,and the curves represent the fits of the data to a sigmoidaldose-response model using nonlinear regression. Raw data are presentedin FIGS. 17 a and 17 b, whereas in FIG. 17 c the data were normalized toa percentage scale of maximal cytotoxicity determined by Triton-X100lysis of target cells.

FIG. 18. Cell-based assay measuring ADCC capacity of WT (H0/L0) andH3/L3 AC10 antibodies comprising Fc variants that provide enhancedeffector function. Raw data were normalized to a percentage scale ofmaximal cytotoxicity determined by Triton-X100 lysis of target cells.

FIG. 19. AlphaScreen™ assay measuring binding between select H3L3secondary AC10 variants and the target antigen CD30.

FIG. 20. Sequence (SEQ ID NO:224), host string content, and structure ofL3.71 AC10 VL.

FIG. 21. Sequence (SEQ ID NO:225), host string content, and structure ofL3.72 AC10 VL.

FIG. 22. Sequence (SEQ ID NO:226), host string content, and structure ofH3.68 AC10 VH.

FIG. 23. Sequence (SEQ ID NO:227), host string content, and structure ofH3.69 AC10 VH.

FIG. 24. Sequence (SEQ ID NO:228), host string content, and structure ofH3.70 AC10 VH.

FIG. 25. Amino acid sequences of a AC10 variant antibodies comprisingthe L3.71 AC10 variant VL with the CLK constant light chain (FIG. 25 a)(SEQ ID NO:229) and the H3.70 AC10 variant VH with IgG constant chains(FIGS. 25 b-25 e) (SEQ ID NOS:230-233) that may comprise amino acidmodifications in the Fc region. FIG. 25 b (SEQ ID NO:230) provides anIgG1 heavy chain with positions that may be mutated designated in boldas X₁, X₂, X₃, and X₄, referring to residues S239, V264, A330, and I332.FIG. 25 c (SEQ ID NO:231) provides one example of a heavy chaindescribed in FIG. 25 b, here comprising the H3.70 AC10 variant VH regionwith the S239D/A330L/I332E IgG1 constant region. FIG. 25 d (SEQ IDNO:232) provides an IgG2 heavy chain with positions that may be mutatedand designated in bold as X₁, X₂, X₃, X₄, Z₁, Z₂, Z₃, Z₄, and Z₅referring to residues S239, V264, A330, I332, P233, V234, A235, -236,and G237 (here -236 refers to a deletion at EU index position 236). FIG.25 e (SEQ ID NO:233) provides one example of a heavy chain described inFIG. 25 d, here comprising the H3.70 AC10 variant VH region with theS239D/A330L/I332E/P233E/V234L/A235L/-236G IgG2 constant region.

FIG. 26. Sequence (SEQ ID NO:234), host string content, and structure ofWT C225 VL.

FIG. 27. Sequence (SEQ ID NO:235), host string content, and structure ofWT C225 VH.

FIG. 28. Sequence (SEQ ID NO:236), host string content, and structure ofCDR grafted C225 VL, which was derived from the CDRs of WT C225 and theframeworks of the human germline sequence vlk_(—)6D-21 and an L106I(Kabat numbering) substitution in FR4.

FIG. 29. Sequence (SEQ ID NO:237), host string content, and structure ofCDR grafted C225 VH, which was derived from the CDRs of WT C225 and theframeworks of the human germline sequence vh_(—)4-30-4 and an A113S(Kabat numbering) substitution in FR4.

FIG. 30. C225 VL and VH variants with optimized HSC_(SEQ IDNOS:238-274). The nonredundant set of output sequences from thecalculations described in Example 2 are shown.

FIG. 31. C225 VL and VH variants with optimized HSC (SEQ IDNOS:275-378). The nonredundant set of output sequences from thecalculations described in Example 2 are shown.

FIG. 32. Sequence (SEQ ID NO:379), host string content, and structure ofL2 C225 VL.

FIG. 33. Sequence (SEQ ID NO:380), host string content, and structure ofL3 C225 VL.

FIG. 34. Sequence (SEQ ID NO:381), host string content, and structure ofL4 C225 VL.

FIG. 35. Sequence (SEQ ID NO:382), host string content, and structure ofH3 C225 VH.

FIG. 36. Sequence (SEQ ID NO:383), host string content, and structure ofH4 C225 VH.

FIG. 37. Sequence (SEQ ID NO:384), host string content, and structure ofH5 C225 VH.

FIG. 38. Sequence_(SEQ ID NO:385), host string content, and structure ofH6 C225 VH.

FIG. 39. Sequence (SEQ ID NO:386), host string content, and structure ofH7 C225 VH.

FIG. 40. Sequence (SEQ ID NO:387, host string content, and structure ofH8 C225 VH.

FIG. 41. SPR sensorgrams showing binding of full length antibody C225variants to the EGFR target antigen. The sensorgrams show binding ofC225 WT (L0/H0) and variant (L0/H3, L0/H4, L0/H5, L0/H6, L0/H7, L0/H8,L2/H3, L2/H4, L2/H5, L2/H6, L2/H7, L2/H8, L3/H3, L3/H4, L3/H5, L3/H6,L3/H7, L3/H8, L4/H3, L4/H4, L4/H5, L4/H6, L4/H7, and L4/H8) full lengthantibodies to the EGFR sensor chip. The curves consist of an associationphase and dissociation phase, the separation being marked by a littlespike on each curve.

FIG. 42. Cell-based ADCC assay of C225 WT (L0/H0) and variant (L0/H3,L0/H4, L0/H5, L0/H6, L0/H7, L0/H8, L2/H3, L2/H4, L2/H5, L2/H6, L2/H7,L2/H8, L3/H3, L3/H4, L3/H5, L3/H6, L3/H7, L3/H8, L4/H3, L4/H4, L4/H5,L4/H6, L4/H7, and L4/H8) full length antibodies. Purified humanperipheral blood monocytes (PBMCs) were used as effector cells, A431epidermoid carcinoma cells were used as target cells at a 10:1effector:target cell ratio, and lysis was monitored by measuring LDHactivity using the Cytotoxicity Detection Kit (LDH, Roche DiagnosticCorporation, Indianapolis, Ind.). Samples were run in triplicate toprovide error estimates (n=3, +/−S.D.). FIG. 42 shows the dosedependence of ADCC at various antibody concentrations, normalized to theminimum and maximum levels of lysis for the assay. The curves representthe fits of the data to a sigmoidal dose-response model using nonlinearregression.

FIG. 43. Sequence_(SEQ ID NO:388), host string content, and structure ofWT ICR62 VL.

FIG. 44. Sequence (SEQ ID NO:389), host string content, and structure ofWT ICR62 VH.

FIG. 45. Sequence (SEQ ID NO:390), host string content, and structure ofCDR grafted ICR62 VL. CDR grafted ICR62 VL was derived from the CDRs ofWT ICR62 and the frameworks of the human germline sequence vlk_(—)1-17and an L106I (Kabat numbering) substitution in FR4.

FIG. 46. Sequence (SEQ ID NO:391), host string content, and structure ofCDR grafted ICR62 VH. CDR grafted ICR62 VH was derived from the CDRs ofWT ICR62 and the frameworks of the human germline sequence vh_(—)14 andsubstitutions A107T and S108L (Kabat numbering) in FR4.

FIG. 47. ICR62 VL and VH variants with optimized HSC (SEQ IDNOS:392-455).

FIG. 48. Sequence (SEQ ID NO:456), host string content, and structure ofL3 ICR62 VL.

FIG. 49. Sequence (SEQ ID NO:457), host string content, and structure ofH9 ICR62 VH.

FIG. 50. Sequence (SEQ ID NO:458), host string content, and structure ofH10 ICR62 VH.

FIG. 51. Comparison of VH sequences humanized by the methods in theprior art versus the present method. Prior art antibodies include Ctm01,A5B7, Zenapax, MaE11, 1129, MHM2, H52, Huzaf, Hu3S193, D3H44, AQC2, 2C4,D3H44, Hfe7A, 5C8, m4D5, A.4.6.1, Campath, HuLys11, A.4.6.1, Mylotarg,MEDI-507, huH65_vh, EP-5C7, 9F3, HPC4, 38C2, Br96, 1A6, and 6.7.Sequences designed using the present invention, including AC10 H1, H2,and H3, C225 H3, H4, H5, H6, H7, and H8, and ICR62 H9 and H10, areoffset to the right. FIG. 51 a provides the host string content (HSC) asdefined by equation 3, FIG. 51 b provides the exact string content (ESC)as defined by equation 3a, and FIG. 51 c provides the framework regionhomogeneity (FRH) as defined by equation 10. Window size w was 9 for allcalculations.

FIG. 52. Comparison of VL sequences humanized by the methods in theprior art versus the present method. Prior art antibodies include Ctm01,A5B7, Zenapax, MaE11, 1129, MHM2, H52, Huzaf, Hu3S193, D3H44, AQC2, 2C4,D3H44, Hfe7A, 5C8, m4D5, A.4.6.1, Campath, HuLys11, A.4.6.1, Mylotarg,MEDI-507, huH65_vh, EP-5C7, 9F3, HPC4, 38C2, Br96, 1A6, and 6.7.Sequences designed using the present invention, including AC10 L1, L2,and L3, C225 L2, L3, L4, and ICR62 L2, are offset to the right. FIG. 52a provides the host string content (HSC) as defined by equation 3, FIG.52 b provides the exact string content (ESC) as defined by equation 3a,and FIG. 52 c provides the framework region homogeneity (FRH) as definedby equation 10. Window size w was 9 for all calculations.

DETAILED DESCRIPTION OF THE INVENTION Definitions

In order that the invention may be more completely understood, severaldefinitions are set forth below. Such definitions are meant to encompassgrammatical equivalents.

By “amino acid” as used herein is meant one of the 20 naturallyoccurring amino acids or any non-natural analogues that may be presentat a specific, defined position.

By “amino acid modification” herein is meant an amino acid substitution,insertion, and/or deletion in a polypeptide sequence. The preferredamino acid modification herein is a substitution.

By “amino acid substitution” or “substitution” herein is meant thereplacement of an amino acid at a given position in a protein sequencewith another amino acid.

By “antibody” herein is meant a protein consisting of one or moreproteins substantially encoded by all or part of the recognizedimmunoglobulin genes. The recognized immunoglobulin genes, for examplein humans, include the kappa (κ), lambda (λ), and heavy chain geneticloci, which together comprise the myriad variable region genes, and theconstant region genes mu (μ), delta (δ), gamma (γ), sigma (σ), and alpha(α) which encode the IgM, IgD, IgG, IgE, and IgA isotypes respectively.Antibody herein is meant to include full length antibodies and antibodyfragments, and may refer to a natural antibody from any organism, anengineered antibody, or an antibody generated recombinantly forexperimental, therapeutic, or other purposes. By “IgG” as used herein ismeant a protein belonging to the class of antibodies that aresubstantially encoded by a recognized immunoglobulin gamma gene. Inhumans this class comprises IgG1, IgG2, IgG3, and IgG4.

By “corresponding” or “equivalent” residues as meant herein are residuesthat represent similar or homologous sequence and/or structuralenvironments between a first and second protein, or between a firstprotein and set of multiple proteins. In order to establish homology,the amino acid sequence of a first protein is directly compared to thesequence of a second protein. After aligning the sequences, using one ormore of the homology alignment programs known in the art (for exampleusing conserved residues as between species), allowing for necessaryinsertions and deletions in order to maintain alignment (i.e., avoidingthe elimination of conserved residues through arbitrary deletion andinsertion), the residues equivalent to particular amino acids in theprimary sequence of the first protein are defined. Alignment ofconserved residues preferably should conserve 100% of such residues.However, alignment of greater than 75% or as little as 50% of conservedresidues is also adequate to define equivalent residues. Correspondingresidues may also be defined by determining structural homology betweena first and second protein that is at the level of tertiary structurefor proteins whose structures have been determined. In this case,equivalent residues are defined as those for which the atomiccoordinates of two or more of the main chain atoms of a particular aminoacid residue of the proteins (N on N, CA on CA, C on C and O on O) arewithin 0.13 nm and preferably 0.1 nm of each other after alignment.Alignment is achieved after the best model has been oriented andpositioned to give the maximum overlap of atomic coordinates ofnon-hydrogen protein atoms of the proteins.

By “CDR” as used herein is meant a Complementarity Determining Region ofan antibody variable domain. Systematic identification of residuesincluded in the CDRs have been developed by Kabat (Kabat et al., 1991,Sequences of Proteins of Immunological Interest, 5th Ed., United StatesPublic Health Service, National Institutes of Health, Bethesda) andalternately by Chothia (Chothia & Lesk, 1987, J. Mol. Biol. 196:901-917; Chothia et al., 1989, Nature 342: 877-883; Al-Lazikani et al.,1997, J. Mol. Biol. 273: 927-948). For the purposes of the presentinvention, CDRs are defined as a slightly smaller set of residues thanthe CDRs defined by Chothia. VL CDRs are herein defined to includeresidues at positions 27-32 (CDR1), 50-56 (CDR2), and 91-97 (CDR3),wherein the numbering is according to Chothia. Because the VL CDRs asdefined by Chothia and Kabat are identical, the numbering of these VLCDR positions is also according to Kabat. VH CDRs are herein defined toinclude residues at positions 27-33 (CDR1), 52-56 (CDR2), and 95-102(CDR3), wherein the numbering is according to Chothia. These VH CDRpositions correspond to Kabat positions 27-35 (CDR1), 52-56 (CDR2), and95-102 (CDR3).

By “framework” as used herein is meant the region of an antibodyvariable domain exclusive of those regions defined as CDR's. Eachantibody variable domain framework can be further subdivided into thecontiguous regions separated by the CDR's (FR1, FR2, FR3 and FR4).

By “germline” as used herein is meant the set of sequences that composethe natural genetic repertoire of a protein, and its associated alleles.

By “host” as used herein is meant a family, genus, species orsubspecies, group of individuals or even a single individual. A hostgroup of individuals can be selected for based upon a variety ofcriteria, such as MHC allele composition, etc. In a preferredembodiment, a host is canine, murine, primate, or human. In the mostpreferred embodiment, a host is human.

By “host string” or “host sequence” as used herein is meant a string orsequence that encodes any part of a naturally occurring host protein.

By “humanized” antibody as used herein is meant an antibody comprising ahuman framework region and one or more CDR's from a non-human (usuallymouse or rat) antibody. The non-human antibody providing the CDR's iscalled the “donor” and the human immunoglobulin providing the frameworkis called the “acceptor”. One says that the donor antibody has been“humanized”, by the process of “humanization”.

By “identity” as used herein is meant the number of residues in a firstsequence that are identical to the residues in a second sequence afteralignment of the sequences to achieve the maximum identity.

By “immune epitope” or “epitope” herein is meant a linear sequence ofamino acids that is located in a protein of interest. Epitopes may beanalyzed for their potential for immunogenicity. Epitopes may be anylength, preferably 9-mers.

By “immunogenicity” herein is meant the ability of a protein to elicitan immune response, including but not limited to production ofneutralizing and non-neutralizing antibodies, formation of immunecomplexes, complement activation, mast cell activation, inflammation,and anaphylaxis.

By “immunoglobulin (Ig)” herein is meant a protein consisting of one ormore proteins substantially encoded by immunoglobulin genes.Immunoglobulins include but are not limited to antibodies.Immunoglobulins may have a number of structural forms, including but notlimited to full length antibodies, antibody fragments, and individualimmunoglobulin domains. By “immunoglobulin (Ig) domain” herein is meanta region of an immunoglobulin that exists as a distinct structuralentity as ascertained by one skilled in the art of protein structure. Igdomains typically have a characteristic β-sandwich folding topology. Theknown Ig domains in the IgG class of antibodies are V_(H), Cγ1, Cγ2,Cγ3, V_(L), and C_(L).

By “natural sequence” or “natural protein” as used herein is meant aprotein that has been determined to exist absent any experimentalmodifications. Also included are sequences that can be predicted toexist in nature based on experimentally determined sequences. An exampleof such a predicted sequence is an antibody that can be predicted toexist based on the established patterns of germline recombination. Inthis case the large size of the predicted antibody repertoire makes theactual experimental determination of all mature recombined antibodiesnot practical.

By “parent” or “parent protein” as used herein is meant a protein thatis subsequently modified to generate a variant. The parent protein maybe a naturally occurring protein, or a variant or engineered version ofa naturally occurring protein. Parent protein may refer to the proteinitself, compositions that comprise the parent protein, or the amino acidsequence that encodes it. Accordingly, by “parent antibody” as usedherein is meant an antibody that is subsequently modified to generate avariant antibody. Accordingly, by “parent sequence” as used herein ismeant the sequence that encodes the parent protein or parent antibody.

By “position” as used herein is meant a location in the sequence of aprotein. Positions may be numbered sequentially, or according to anestablished format, for example Kabat, Chothia, and/or the EU index asin Kabat.

By “protein” herein is meant at least two covalently attached aminoacids, which includes proteins, polypeptides, oligopeptides andpeptides. The protein may be made up of naturally occurring amino acidsand peptide bonds, or synthetic peptidomimetic structures.

By “reduced immunogenicity” herein is meant a decreased ability toactivate the immune system, when compared to the parent protein. Forexample, a protein variant can be said to have “reduced immunogenicity”if it elicits neutralizing or non-neutralizing antibodies in lower titeror in fewer patients than the parent protein. A protein variant also canbe said to have “reduced immunogenicity” if it shows decreased bindingto one or more MHC alleles or if it induces T cell activation in adecreased fraction of patients relative to the parent protein.

By “residue” as used herein is meant a position in a protein and itsassociated amino acid identity. For example, proline 9 (also referred toas Pro9, also referred to as P9) is a residue in the WT AC10 VH region.

By “scoring function” herein is meant any equation or method forevaluating the fitness of one or more amino acid modifications in aprotein. The scoring function may involve a physical or chemical energyterm, or may involve knowledge-, statistical-, sequence-based energyterms, and the like.

By “string” as used herein is meant a contiguous sequence that encodesany part of a protein. Strings may comprise any 2 or more linearresidues, with the number of contiguous residues being defined by the“window” or “window size”. Window sizes of 2-20 are preferred, with 7-13more preferred, with 9 most preferred.

By “target” as used herein is meant the molecule that is boundspecifically by a protein. A target may be a protein, carbohydrate,lipid, or other chemical compound. The target of an antibody is itsantigen, also referred to as its target antigen.

By “variable region” as used herein is meant the region of animmunoglobulin that comprises one or more Ig domains substantiallyencoded by any of the VL (including Vκ and Vλ) and/or V_(H) genes thatmake up the light chain (including kappa and lambda) and heavy chainimmunoglobulin genetic loci respectively. A light or heavy chainvariable region (VL and VH) consists of a “framework” or “FR” regioninterrupted by three hypervariable regions referred to as“complementarity determining regions” or “CDRs”. The extent of theframework region and CDRs have been precisely defined, for example as inKabat (see “Sequences of Proteins of Immunological Interest,” E. Kabatet al., U.S. Department of Health and Human Services, (1983)), and as inChothia. The framework regions of an antibody, that is the combinedframework regions of the constituent light and heavy chains, serves toposition and align the CDRs, which are primarily responsible for bindingto an antigen.

By “variant protein” or “protein variant”, or “variant” as used hereinis meant a protein that differs from a parent protein by virtue of atleast one amino acid modification. Protein variant may refer to theprotein itself, a composition comprising the protein, or the aminosequence that encodes it. Preferably, the protein variant has at leastone amino acid modification compared to the parent protein, e.g. fromabout one to about ten amino acid modifications, and preferably fromabout one to about five amino acid modifications compared to the parent.The protein variant sequence herein will preferably possess at leastabout 80% homology with a parent protein sequence, and most preferablyat least about 90% homology, more preferably at least about 95%homology. Accordingly, by “immunoglobulin variant” as used herein ismeant an immunoglobulin that differs from a parent immunoglobulin byvirtue of at least one amino acid modification.

By “wild type or WT” herein is meant an amino acid sequence or anucleotide sequence that is found in nature and includes allelicvariations. A WT protein has an amino acid sequence or a nucleotidesequence that has not been intentionally modified.

The protein variants of the present invention may be derived from parentproteins that are themselves from a wide range of sources. The parentprotein may be substantially encoded by one or more genes from anyorganism, including but not limited to humans, mice, rats, rabbits,camels, llamas, dromedaries, monkeys, preferably mammals and mostpreferably humans and mice and rats. Although in a preferred embodimentthe parent protein is nonhuman, in some embodiments of the presentinvention the parent protein may be human or similar to human. Theparent protein may comprise more than one protein chain, and thus may bea monomer or an oligomer, including a homo- or hetero-oligomer. In apreferred embodiment, the parent protein is an antibody, referred to asthe parent antibody. The parent antibody need not be naturallyoccurring. For example, the parent antibody may be an engineeredantibody, including but not limited to nonhuman and chimeric antibodies.The parent antibody may be fully human, obtained for example usingtransgenic mice (Bruggemann et al., 1997, Curr Opin Biotechnol8:455-458) or human antibody libraries coupled with selection methods(Griffiths et al., 1998, Curr Opin Biotechnol 9:102-108). The parentantibody need not be naturally occurring. For example, the parentantibody may be an engineered antibody, including but not limited tochimeric antibodies and humanized antibodies (Clark, 2000, Immunol Today21:397-402). The parent antibody may be an engineered variant of anantibody that is substantially encoded by one or more natural antibodygenes. In one embodiment, the parent antibody has been affinity matured,as is known in the art, or engineered in some other way. The parentantibodies of the present invention may be substantially encoded byimmunoglobulin genes belonging to any of the antibody classes, and maycomprise sequences belonging to the IgG class of antibodies, includingIgG1, IgG2, IgG3, or IgG4, or alternatively the IgA (includingsubclasses IgA1 and IgA2), IgD, IgE, IgG, or IgM classes of antibodies.

Virtually any binding partner or antigen may be targeted by the proteinsof the present invention. A number biotherapeutic proteins andantibodies that are approved for use, in clinical trials, or indevelopment may thus benefit from immunogenicity reduction methods ofthe present invention. In a preferred embodiment, the less immunogenicprotein of the present invention is an antibody. The less immunogenicantibody may comprise sequences belonging to the IgG (including IgG1,IgG2, IgG3, or IgG4), IgA (including subclasses IgA1 and IgA2), IgD,IgE, IgG, or IgM classes of antibodies, with the IgG class beingpreferred. The less immunogenic antibodies of the present invention maybe full length antibodies, or antibody fragments. Constant regions neednot be present, but if they are, they will likely be substantiallyidentical to human immunoglobulin constant regions.

The constant region of the antibody may be modified in some way to makeit more effective therapeutically. For example, the constant region maycomprise substitutions that enhance therapeutic properties. Mostpreferred substitutions and optimized effector function properties aredescribed in U.S. Ser. No. 10/672,280, PCT US03/30249, and U.S. Ser. No.10/822,231, and U.S. Ser. No. 60/627,774, filed Nov. 12, 2004 andentitled “Optimized Fc Variants”. Other known Fc variants that may finduse in the present invention include but are not limited to thosedescribed in U.S. Pat. No. 6,737,056; PCT US2004/000643; U.S. Ser. No.10/370,749; PCT/US2004/005112; US 2004/0132101; U.S. Ser. No.10/672,280; PCT/US03/30249; U.S. Pat. No. 6,737,056, US 2004/0002587; WO2004/063351; Idusogie et al., 2001, J. Immunology 166:2571-2572; Hintonet al., 2004, J. Biol. Chem. 279(8): 6213-6216. In alternateembodiments, the constant region may comprise one or more engineeredglycoforms, as is known in the art (Umaña et al., 1999, Nat Biotechnol17:176-180; Davies et al., 2001, Biotechnol Bioeng 74:288-294; Shieldset al., 2002, J Biol Chem 277:26733-26740; Shinkawa et al., 2003, J BiolChem 278:3466-3473); (U.S. Pat. No. 6,602,684; U.S. Ser. No. 10/277,370;U.S. Ser. No. 10/113,929; PCT WO 00/61739A1; PCT WO 01/29246A1; PCT WO02/31140A1; PCT WO 02/30954A1); (Potelligent™ technology [Biowa, Inc.,Princeton, N.J.]; GlycoMAb™ glycosylation engineering technology[GLYCART biotechnology AG, Zürich, Switzerland]).

The protein variants of the present invention may find use in a widerange of protein products. In one embodiment the protein is atherapeutic, a diagnostic, or a research reagent, preferably atherapeutic. Alternatively, the protein of the present invention may beused for agricultural or industrial uses. In a preferred embodiment, theprotein is a therapeutic that is used to treat a disease. By “disease”herein is meant a disorder that may be ameliorated by the administrationof a pharmaceutical composition comprising a protein of the presentinvention. Diseases include but are not limited to autoimmune diseases,immunological diseases, infectious diseases, inflammatory diseases,neurological diseases, and oncological and neoplastic diseases includingcancer. In one embodiment, a protein of the present invention is theonly therapeutically active agent administered to a patient.Alternatively, the protein of the present invention is administered incombination with one or more other therapeutic agents, including but notlimited to cytotoxic agents, chemotherapeutic agents, cytokines, growthinhibitory agents, anti-hormonal agents, kinase inhibitors,anti-angiogenic agents, cardioprotectants, or other therapeutic agents.The proteins of the present invention may be combined with othertherapeutic regimens. For example, in one embodiment, the patient to betreated with the protein may also receive radiation therapy and/orundergo surgery. In an alternate embodiment, the protein of the presentinvention is conjugated or operably linked to another therapeuticcompound. The therapeutic compound may be a cytotoxic agent, achemotherapeutic agent, a toxin, a radioisotope, a cytokine, or othertherapeutically active agent. In yet another embodiment, a protein ofthe present invention may be conjugated to a protein or molecule forutilization in tumor pretargeting or prodrug therapy. Othermodifications of the proteins of the present invention are contemplatedherein. For example, the protein may be linked to one of a variety ofnonproteinaceous polymers, for example e.g., polyethylene glycol (PEG).

Pharmaceutical compositions are contemplated wherein a protein of thepresent invention and one or more therapeutically active agents areformulated. Formulations of the proteins of the present invention areprepared for storage by mixing the protein having the desired degree ofpurity with optional pharmaceutically acceptable carriers, excipients orstabilizers (Remington's Pharmaceutical Sciences 16th edition, Osol, A.Ed.,1980), in the form of lyophilized formulations or aqueous solutions.The formulations to be used for in vivo administration are preferablysterile. The proteins disclosed herein may also be formulated asimmunoliposomes, or entrapped in microcapsules. The concentration of theprotein of the present invention in the formulation may vary from about0.1 to 100 weight %. In a preferred embodiment, the concentration of theprotein is in the range of 0.003 to 1.0 molar. In order to treat apatient, a therapeutically effective dose of the protein of the presentinvention may be administered. The exact dose will depend on the purposeof the treatment, and will be ascertainable by one skilled in the artusing known techniques. Dosages may range from 0.01 to 100 mg/kg of bodyweight or greater, for example 0.1, 1, 10, or 50 mg/kg of body weight,with 1 to 10 mg/kg being preferred. Administration of the pharmaceuticalcomposition comprising a protein of the present invention, preferably inthe form of a sterile aqueous solution, may be done in a variety ofways, including, but not limited to, orally, subcutaneously,intravenously, intranasally, intraotically, transdermally, topically,intraperitoneally, intramuscularly, intrapulmonary, inhalably,vaginally, parenterally, rectally, or intraocularly. As is known in theart, the pharmaceutical composition may be formulated accordinglydepending upon the manner of introduction.

Description of the Methodology

The present invention provides a novel method for reducing theimmunogenicity of a protein. A central principle of the described methodis that substitutions are designed to maximize the content of humanlinear sequence strings using an alignment of human sequences. Forapplication to antibodies, this approach to immunogenicity reductionexcludes the use of the single donor-acceptor model employed inhumanization methods. By stepping outside of the limitations imposed bythe need to choose a human acceptor sequence a priori, a moreimmunologically relevant approach to immunogenicity reduction isenabled. Sequence information and structural information may be used toscore potential amino acid substitutions. The scoring results are usedto design protein variant libraries, which are subsequently screenedexperimentally to determine favorable substitutions. Feedback fromexperimental data may guide subsequent iterations of design andexperimental screening, ultimately enabling protein variants to beengineered with the optimal balance between biophysical andimmunological constraints.

Sequences

Central to the method described herein is that a set of host sequencesprovides information as to the degree to which linear sequence stringshave the potential to be immunogenic. Thus the set of sequences employedis an important parameter. In the most common embodiment, the sequencesare a set of human sequences that are homologous in sequence and/orstructure to the parent sequence. As is known in the art, some proteinsshare a common structural scaffold and are homologous in sequence. Thisinformation may be used to gain insight into particular positions in theprotein family. Sequence alignments are often carried out to determinewhich protein residues are conserved and which are not conserved. Thatis to say, by comparing and contrasting alignments of protein sequences,the degree of variability at a position may be observed, and the typesof amino acids that occur naturally at positions may be observed. Thusfor the present invention, typically the sequences are aligned such thatthe conserved or similar residues that exist between the parent sequenceand the set of human sequences and among the set of human sequences canbe identified. Methods for sequence alignment are well known in the art,and include alignments based on sequence and structural homology.

Protein sequence information can be obtained, compiled, and/or generatedfrom sequence alignments of naturally occurring proteins from anyorganism, including but not limited to mammals. Because a preferredembodiment of present invention is directed towards immunogenicityreduction for biotherapeutics, the sequences that compose the set aremost preferably human. The source of the sequences may vary widely, maybe a database that is compiled publicly or privately, and may be mayinclude one or more of the known general protein and nucleic acidsequences databases, including but not limited to SwissProt, Gen Bank®and Entrez®, and EMBL Nucleotide Sequence Database. Because a preferredembodiment of the present invention is its application to theimmunogenicity reduction of immunoglobulins, a number of immunoglobulindatabases may be useful for obtaining sequences, including but notlimited to the Kabat database (Johnson & Wu, 2001, Nucleic Acids Res29:205-206; Johnson & Wu, 2000, Nucleic Acids Res 28:214-218), the IMGTdatabase (IMGT, the international ImMunoGeneTics information system®;Lefranc et al., 1999, Nucleic Acids Res 27:209-212; Ruiz et al., 2000Nucleic Acids Re. 28:219-221; Lefranc et al., 2001, Nucleic Acids Res29:207-209; Lefranc et al., 2003, Nucleic Acids Res 31:307-310), andVBASE.

As is well known in the art, immunoglobulins possess a high degree ofsequence and structural homology, and therefore alignment of sequencesprovides a wealth of information. Due to the existence of deletions andinsertions in these alignments, numbering conventions have been adoptedto enable a normalized reference to conserved positions inimmunoglobulin families or subfamilies. Those skilled in the art willappreciate that these conventions consist of nonsequential numbering inspecific regions of an immunoglobulin sequence, and thus accordingly thepositions of any given immunoglobulin as defined by any given numberingscheme will not necessarily correspond to its sequential sequence or tothose in an alternate numbering scheme. For all variable regionsdiscussed in the present invention, numbering is according to thenumbering scheme of Kabat (Kabat et al., 1991, Sequences of Proteins ofImmunological Interest, 5th Ed., United States Public Health Service,National Institutes of Health, Bethesda). For all constant regionpositions discussed in the present invention, number is according to theEU index as in Kabat. Alternate numbering schemes may find use in thepresent invention, including but not limited that of Chothia (Chothia &Lesk, 1987, J. Mol. Biol. 196: 901-917; Chothia et al., 1989, Nature342: 877-883; Al-Lazikani et al., 1997, J. Mol. Biol. 273: 927-948).

In a most preferred embodiment, the set of human sequences used is analigned set of human germline immunoglobulin sequences. For example,FIGS. 1 a-1 c (SEQ ID NOS: 1-109) provide the set of sequences thatcompose the human antibody variable region germline (VH, VL, and Jchains), along with the corresponding diversity at each position. Thehuman germline repertoire for immunoglobulin heavy chain variableregions and immunoglobulin light chain kappa variable regions have beenreported (Matsuda et al., 1998, J Exp Med 188: 2151-2162; Zachau, 2000,Biol Chem 381:951-954; Pallares et al., 1999, Exp Clin Immunogenet16(1): 36-60; Barbie & Lefranc, 1998, Exp Clin Immunogenet 15(3):171-83). The human immunoglobulin kappa variable (IGKV) genes andjoining (IGKJ) segments. Barbie V, Lefranc M P). The rationale for useof this type of sequence information as a metric for humanness is thatthe strings that compose the human germline should be minimallyimmunogenic. Sequences need not be human genomic or germline sequences.In other preferred embodiments, human antibody variable region sequencesare derived not from germline information, but rather from maturedantibodies obtained for example from hybridoma technology or cDNAlibraries.

For many of the genes in the human immunoglobulin germline, severaldifferent alleles have been identified. Although the polymorphismsdetected in many of the alleles do not change the amino acid sequence ofthe gene, in a great number of cases the sequence is changed. Inchoosing a set of sequences to use in the method described herein,different sets of sequences may be chosen. When choosing a single alleleas representative of a specific gene the most cautious approach is tochoose that sequence which is closest to the consensus of the entiregermline. This subset of sequences would thereby be most likely to berepresented within the population as a whole. Alternatively, a muchgreater sequence diversity could be sampled by choosing representativesequences that are furthest from the consensus. Another approachyielding greater diversity would be to use multiple alleles where theyexist for each germline. At this time, there is little or noquantitative data on allele frequency within the population. When allelefrequency becomes available, a more informed decision can be maderegarding the likelihood of tolerance for a specific non-consensusallele within the target patient population.

When two or more possible substitutions are being evaluated for use at aspecific position when both are found in the human germline, thedecision may become subjective. In such a case additional informationcan be incorporated that may reflect different levels of expression ofparticular genes (Cox et al. Eur J Immunol. 1994 April; 24(4):827-36).One underlying assumption of such a strategy would be that relativeexpression level of a particular germline (or corresponding sequencestrings) correlates with the relative immunogenicity.

The sequences used for the method disclosed herein are those ofhomologous proteins with sufficient homology to allow their alignmentwith the protein whose immunogenicity is being reduced. One might arguethat if a particular protein sequence is found anywhere within theexpressed human genome that there is innate tolerance to that peptide.Such a proposition greatly increases the number of possible sequencesthat could be used to reduce the immunogenicity of a protein. In such acase however, alignment of proteins that are not structurally homologouswould likely be prohibitive. In addition, the processing of a protein toproduce the strings to which tolerance is developed may be structurallydetermined. Therefore, a specific strings may be nonimmunogenic in itsnative context but immunogenic in an altered structural context.

Scoring Functions—String Content

In order to evaluate the fitness of protein variants, amino acidmodifications in the parent protein may be scored using a variety ofscoring functions. Central to preferred embodiment of immunogenicityreduction method described herein is that at least one scoring functionis aimed at maximizing the content of host linear sequence strings thatare present in a set of host sequences. Typically, but not always, acomputer is used to score potential amino acid substitutions.

In one embodiment, substitutions may be scored according to theiroccupancy in the set of host sequences, i.e., whether or not a givenamino acid is part of the diversity at a given position. The use ofposition-specific alignment information to generate a list of consideredamino acids at a variable position is well known in the art; see forexample Lehmann & Wyss, 2001, Curr Opin Biotechnol 12(4): 371-5; Lehmannet al., 2000, Biochim Biophys Acta 1543(2):408-415; Rath & Davidson,2000, Protein Sci, 9(12):2457-69; Lehmann et al., 2000, Protein Eng13(1):49-57; Desjarlais & Berg, 1993, Proc Natl Aced Sci USA90(6):2256-60; Desjarlais & Berg, 1992, Proteins 12(2):101-4; Henikoff &Henikoff, 2000, Adv Protein Chem 54:73-97; Henikoff & Henikoff, 1994, JMol Biol 243(4):574-8. Thus, for example, for the parent nonhuman VLκsequence aligned to the human sequences in FIG. 1 b (SEQ ID NOS: 54-85),substitutions to be considered at position 1 would be Ala, Asp, Glu,Asn, and Val. In a more preferred embodiment, substitutions are scoredbased on their frequency in the set of human sequences listed. Forexample, in the previous example, Asp and Glu occur most frequently atposition 1, and thus may be more preferable substitutions that Ala, Asn,or Val. The basis for this scoring function is that the frequency of agiven amino acid at a given position in the alignment is proportional toits potential for being in a host string.

Occupancy and frequency provide relatively straightforwardapproximations for designing substitutions that have the potential forreduced immunogenicity. Their use, however, does not take into accountthe context of the parent sequence. Although frequency is proportionalto the potential for a substitution to increase host content of astring, it is not a direct measure. In order to more accuratelyincorporate the information present in an aligned set of host sequencesinto a measure of immunogenicity, an approach can be taken wherein thelinearity or contiguity of a given position in the context of thestrings that comprise it is considered. In this most preferredembodiment, substitutions in a parent sequence are scored based on theprobability of removing a nonhost string and replacing it with a lessimmunogenic string, namely one present in the set of host sequences.This method of scoring may employ the calculation of identity or percentidentity of a parent string to a host string within a window ofequivalent positions. In one embodiment, the identity of a string insequence s to a host string in sequence h, (IDstring), can be presentedas the sum of amino acid sequence identities in a given window size,according to equation 1:

$\begin{matrix}{{{IDstring}\left( {i,w} \right)} = {\sum\limits_{j = i}^{i + w - 1}\; \delta_{{aa}_{j}^{s},{aa}_{j}^{h}}}} & {{Equation}\mspace{14mu} 1}\end{matrix}$

where w is the string window size, i is the first position in thestring, aa^(s) _(j) is the amino acid at position j of sequence s,aa^(h) _(j) is the amino acid at position j of the host sequence h, andthe Kronecker delta function is used to return a value of 1 for a match(for example if the parent and host amino acids at position j are bothserine) and 0 if there is no match (for example if the parent amino acidat position j is a serine but the host amino acid is a leucine). FIG. 2a illustrates equation 1 using a region of the VH of murine anti-Her2antibody m4D5 (VH_m4D5) (SEQ ID NO: 110} as the parent sequence s andthe homologous region from the VH human germline (VH_(—)1-2) as humansequence (SEQ ID NO:111)

In a further embodiment, it is assumed that the most immunologicallyappropriate measure of host string content at position i is the maximalidentity between a string of sequence s and any host sequence in thealignment, as calculated in equation 2:

$\begin{matrix}{{IDmax} = {\max\limits_{h \in {HS}}\left( {\sum\limits_{j = i}^{i + w - 1}\; \delta_{{aa}_{j}^{s},{aa}_{j}^{h}}} \right)}} & {{Equation}\mspace{14mu} 2}\end{matrix}$

where HS is the set of host sequences. In other words, if IDstring atposition i is equal to w for any one of the host sequences, IDmax=w aswell, and the i^(th) string is assumed to be minimally immunogenic. Theconcept of the IDmax quantity represented by Equation 2 is illustratedin FIG. 2 b.

Finally, these equations can be combined to calculate a single numericalmetric for total host string content (HSC) of a sequence s by summingthe IDmax values over all pertinent sequence positions, as in equation3:

$\begin{matrix}{{{HSC}(s)} = {{100 \cdot \frac{1}{\left( {L - w + 1} \right) \cdot w}}{\sum\limits_{i = 1}^{L - w + 1}\; {\max\limits_{h \in {HS}}\left( {\sum\limits_{j = i}^{i + w - 1}\; \delta_{{aa}_{j}^{s},{aa}_{j}^{h}}} \right)}}}} & {{Equation}\mspace{14mu} 3}\end{matrix}$

where L is the length of the sequence and HS is the set of hostsequences in the alignment. A perfectly host sequence would have an HSCof 100. One might alternatively say that such a sequence is 100% host.The concept of the HSC quantity represented by Equation 3 is illustratedin FIG. 2 c. In alternative embodiments, Equation 3 can be modifiedfurther such that the final score is dependent on the relative usage ofeach host sequence in the alignment. Strings from sequences that aremore frequently expressed by hosts are expected to be more tolerized,and therefore may be given correspondingly higher influence in a scoringsystem.

In an alternative embodiment, one can measure the exact string content(ESC) as in Equation 3a:

$\begin{matrix}{{{{ESC}(s)} = {{100 \cdot \frac{1}{\left( {L - w + 1} \right) \cdot w}}{\sum\limits_{i = 1}^{L - w + 1}\; {\max\limits_{h \in {HS}}\; \delta_{{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{s},{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{h}}}}}},} & {{Equation}\mspace{14mu} 3a}\end{matrix}$

where the notation aa^(s) _(i,i+w−1) refers the contiguous sequencestring in protein s from position i to position i+w−1. In thisembodiment, only perfect matches of size w are counted in the score.

It is worth noting that, since the scoring systems in Equations 3 and 3aare based on local sequence identity and/or similarity evaluated overwindows of defined size, a sequence with high HSC can be constructed ofsequence segments that are maximally similar to different members of theset of host sequences at different positions.

The above measure of hostness is likely to be more immunologicallyrelevant than the more commonly used global identity measure of equation4:

$\begin{matrix}{{globalID} = {100 \cdot {\max\limits_{h \in {HS}}{\frac{1}{L}{\sum\limits_{i = 1}^{L}\; \delta_{{aa}_{i}^{s},{aa}_{i}^{h}}}}}}} & {{Equation}\mspace{14mu} 4}\end{matrix}$

Equation 4 disregards the extent of contiguous sequence identity, whichis particularly relevant for capturing the molecular behavior of theimmune system.

Additional scoring functions similar to equations 3 are also possible.For example, as will be appreciated by those skilled in the art, thereis some uncertainty regarding the hostness of a string whereinIDmax=w−1, w−2, etc. In one alternative embodiment, sequence similarityis compared instead of identity, using any of a variety of amino acidsubstitution matrices (e.g. PAM, BLOSUM62, etc.), providing a hoststring similarity (HSS) score as in equation 5:

$\begin{matrix}{{{HSS}(s)} = {{100 \cdot \frac{1}{L - w + 1}}{\sum\limits_{i = 1}^{L - w + 1}\; {\max\limits_{h \in {HS}}\left( ^{\sum\limits_{j = i}^{i + w - 1}\; {({S_{{aa}_{j}^{s},{aa}_{j}^{h}} - S_{{aa}_{j}^{s},{aa}_{j}^{s}}})}} \right)}}}} & {{Equation}\mspace{14mu} 5}\end{matrix}$

where S is a substitution score comparing any two amino acids. In yetanother alternative, sequence identities are weighted according to theextent of identity, as in equation 6:

$\begin{matrix}{{{HSC}(s)} = {{100 \cdot \frac{1}{\left( {L - w + 1} \right)}}{\sum\limits_{i = 1}^{L - w + 1}{f\left( \; {\max\limits_{h \in {HS}}\left( {\sum\limits_{j = i}^{i + w - 1}\delta_{{aa}_{j}^{s},{aa}_{j}^{h}}} \right)} \right)}}}} & {{Equation}\mspace{14mu} 6}\end{matrix}$

where f is a continuous or noncontinuous function dependent on IDmax.For example, perfect matches can be weighted greater than near perfectmatches (e.g. f(w)=1, f(w−1)=0.5, etc.), and poor matches can bediscarded (e.g. f(w−3)=f(w−2)=0).

String Window Size

The fundamental binding units of class I and class II MHC proteins areboth 9 amino acids. In a preferred embodiment, the window size w used tocreate and score parent sequences is 9. However, it is also known thatadditional peptide flanking residues (PFRs) can influence T-cellrecognition (via the TCR) of class II MHC-peptide complexes (see forexample Arnold et al., 2002, J Immunology 169(2): 739-49), with theresidues at positions P-1 (one position before the 1^(st) MHC bindingposition) and P11 being most influential. Because these effects mightinfluence immune tolerance, a desirable goal of the invention, largerwindow sizes (e.g. 12) can be used. It should be noted however, thatsequences optimized with similar window sizes are highly correlated.

Optimization of HSC

Although a definition of string scoring systems is useful, an efficientprocess for discovering sequences with high HSC is also desirable. It istherefore a further aspect of the invention to provide methods fordynamic optimization of HSC given the described scoring systems.

Desirable features of an optimization method include but are not limitedto the following: 1) the output sequences are optimal or near-optimal(subject to design constraints) in their host string content; 2)structural constraints can be used to modulate the nature of theoptimized sequences; and 3) multiple near-optimal solutions can begenerated. Additionally, in some preferred embodiments, host stringcontent may be maximized using a minimal number of substitutions.

In a preferred embodiment, an iterative algorithm for optimization ofHSC works as follows. 1) a parent sequence and set of host sequences aredefined; 2) mutational constraints are defined at functionally orstructurally important positions, referred to herein as masking—in apreferred embodiment, for antibody applications, positions within orstructurally proximal to CDR residue (as defined by herein, oralternatively as defined by Kabat or Chothia) and/or interface aremasked, locked, or fixed so that mutations are not possible (in someembodiments this constraint can be relaxed if the potential mutation isa conservative substitution of the parent amino acid). In a preferredembodiment, positions within 5 angstroms of a CDR residue or interfaceare masked. In other preferred embodiments, positions within 6.5angstroms of a CDR residue or interface are masked; 3) host sequencesegments (up to a defined length: lengths from 1-6 are typical) arecollected from the alignment and stored for each position: segments thatviolate the mutational constraints are not collected; 4) each segment isanalyzed for its potential impact on HSC, in the context of the currentparent sequence, defined as String Impact (SI) in equation 7:

SI(x _(m)(z)→y _(m)(z))=HSC(s(y _(m)))−HSC(parent),   Equation 7

where y_(m)(z) is a host segment of length z replacing segment x atposition m, and s(y_(m)) and parent are versions of the parent sequencethat include these segments (the parent sequence contains x_(m)(z)). 5)a single string is randomly selected from all stored host strings; theprobability of selection is biased and proportional to the impact on HSCand inversely proportional to the number of mutations relative to thecurrent parent sequence, as in Equation 8:

$\begin{matrix}{P \propto \frac{{SI}\left( {x_{m}(z)}\rightarrow{y_{m}(z)} \right)}{\sum\limits_{i = m}^{m + z - 1}\delta_{{aa}_{i}^{parent},{aa}_{i}^{s{(y_{m})}}}}} & {{Equation}\mspace{14mu} 8}\end{matrix}$

This selected string is substituted into the current parent sequence forits corresponding parent amino acids on an amino acid string by aminoacid string basis. This kind of selection bias tends to optimize hoststring content with minimal perturbation of the original sequence. 6)steps 4 and 5 are repeated until no further optimization is possible (nosegment substitutions have a favorable impact on host string content).

Such an algorithm is inherently non-deterministic, so independent runsof the algorithm will tend to generate different solutions (this is afavorable feature). In a preferred embodiment, such an algorithm isapplied numerous times to generate a diverse array of unique solutions.These solutions can be further clustered such that representativesequences can be prioritized for further analysis. For example, in oneembodiment, the solution sequences are clustered into groups of similarsequences according to mutational distance, using a nearest neighborsingle linkage hierarchical clustering algorithm to assign sequences torelated groups based on similarity scores. Clustering algorithms may beuseful for classifying sequences into representative groups.Representative groups may be defined, for example, by similarity.Measures of similarity include, but are not limited to sequencesimilarity and energetic similarity. Thus the output sequences fromcomputational screening may be clustered around local minima, referredto herein as clustered sets of sequences. Sets of sequences that areclose in sequence space may be distinguished from other sets. In oneembodiment, diversity across clustered sets of sequences may be sampledby experimentally testing only a subset of sequences within eachclustered set. For example, all or most of the clustered sets could bebroadly sampled by including the lowest energy sequence from eachclustered set of sequences to be experimentally tested. Because thesequence space of solutions with optimized HSC can be large, additionalmethods can be applied to ensure that a broad set of sequences iscreated. In a preferred embodiment, individual framework sequencesgenerated by the procedure are clustered separately to generate a listof nonredundant basis framework regions (FRs) with high HSC. These basisFRs are then computationally assembled in all combinations along withthe CDRs to generate a secondary list of solution sequences (which willusually have some overlap with the primary set). Alternatively, thebasis FRs may be combined into an experimental library, for example acombinatorial library.

Framework Diversity

Application of this algorithm will generate variant protein solutionsfor which HSC is higher than the original parent sequence. It will alsofrequently generated solutions in which substituted strings are derivedfrom different members of the alignment. The variant sequences derivedusing the present invention generally have unique properties relative tosequences generated using other methodologies. For example, in thecontext of an antibody, the protein variants of the invention frequentlyderive their host string content from a combination of different hostgermline sequences. This may be true even within a single FR.Quantification of these properties is useful for defining the nature ofsequences derived using the present invention. A clear distinctionemerges from a comparison of exact string content (meaning a perfectmatch over window w) in any single germline sequence versus exact w-merstring content within the set of all germline sequences (content ofstrings for which IDmax=w). Single germline exact string content (SGESC)of a variant sequence v may thus be defined as:

$\begin{matrix}{{{{SGESC}(v)} = {100 \cdot \frac{1}{L - w + 1} \cdot {\max\limits_{h \in {HS}}{\sum\limits_{i = 1}^{L - w + 1}\delta_{{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{v},{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{h}}}}}}\;} & {{Equation}\mspace{14mu} 9}\end{matrix}$

This quantity provides the extent to which a string-optimized sequencehas string identity with the closest single germline sequence. Usingthis definition, it is also possible to assess the extent to which thehigh host string content of a given variant sequence v is derived from asingle germline as opposed to multiple germline sequences. Frameworkregion homogeneity (FRH) is defined as follows:

$\begin{matrix}{{{FRH}(v)} = {\frac{{SGESC}(v)}{{ESC}(v)} = \frac{\max\limits_{h \in {HS}}{\sum\limits_{i = 1}^{L - w + 1}\delta_{{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{v},{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{h}}}}{\sum\limits_{i = 1}^{L - w + 1}{\max\limits_{h \in {HS}}\delta_{{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{s},{aa}_{{i\mspace{14mu} \ldots \mspace{14mu} i} + w - 1}^{h}}}}}} & {{Equation}\mspace{14mu} 10}\end{matrix}$

In other words, if a variant sequence's exact string content is derivedsolely from a single germline sequence, the FRH would be close to 1.0.It should be noted that a similar or identical quantity can be definedfor non-antibody proteins. Alternatively, as is the case with many ofthe variant sequences created by the present invention, FRH values canbe significantly less than 1, with values ranging from 0.4 to 1.0,indicating, as expected, that sequences with high exact string contentcan be discovered with contributions from multiple germline subfamiliesand sequences. As described more fully in Example 5 below, variantsequences generated using the present invention have high HSC values yetmany have low FRH values, indicating their HSC is derived from multiplegermline frameworks.

Additional Scoring

The above methods of scoring use the information present in an alignedset of host sequences as a metric of immunogenicity to maximize thecontent of host linear sequence strings in a parent sequence. Inaddition to such scoring functions, other scoring functions and methodsmay be employed. Such additional scoring functions may be aimed at thesame goal as the aforementioned linear string scoring function, namelyimmunogenicity reduction of the parent protein. Alternatively, suchadditional scoring functions may be used to achieve other goals, forexample optimization of protein stability, solubility, expression,pharmacokinetics, and/or aspects of protein function such as affinity ofthe parent protein for a target ligand, specificity, effector function,and/or enzymatic activity. For example an additional scoring functionmay be employed to enhance the affinity of an antibody variable domainfor its target antigen. Such additional scoring functions may beemployed statically or dynamically for the generation of optimizedprotein variants. A number of embodiments are described below aspreferred additional scoring functions that may be used with theaforementioned linear string scoring method of the present invention.However, these are not meant to constrain the invention to theseembodiments, and it should be clear that any method of scoring thefitness of an amino acid modification in a parent protein may be coupledwith the novel linear string scoring method of the present invention sothat optimal protein variants may be designed.

In a preferred embodiment, substitutions are scored based on theirstructural compatibility with the structure of the parent protein. Suchmethods of scoring may require the structural coordinates that describethe three-dimensional structure of the protein, for example as obtainedby X-ray crystallographic and nuclear magnetic resonance (NMR)techniques. Suitable proteins structures may also be obtained fromstructural models, which may be generated by methods that are known inthe art of structural biology, including but not limited to de novo andhomology modeling. Structure-based scoring functions may include anynumber of potentials that describe or approximate physical or chemicalenergy terms, including but not limited to a van der Waals potential, ahydrogen bond potential, an atomic solvation potential or othersolvation models, a secondary structure propensity potential, anelectrostatic potential, a torsional potential, an entropy potential,and/or additional energy terms. In other preferred embodiments, scoringmethods may also be derived from sequence information, including but notlimited to knowledge-based potentials derived from protein sequenceand/or structure statistics, threading potentials, reference energies,pseudo energies, homology-based energies, and sequence biases derivedfrom sequence alignments. In alternately preferred embodiments, bothstructural and sequence-based potentials are used to generate one ormore scoring functions that may be coupled with the linear stringscoring method of the present invention.

In a most preferred embodiment, a scoring method is used wherein thestructural and functional integrity of substitutions are evaluated usinga sequence and structure-based scoring function described in U.S. Ser.No. 60/528,229, filed Dec. 8, 2003, entitled Protein Engineering withAnalogous Contact Environments; and U.S. Ser. No. 60/602,566, filed Aug.17, 2004, entitled Protein Engineering with Analogous ContactEnvironments. This method combines sequence alignment information andstructural information to predict the structural compatibility of one ormore substitutions with a protein structure template. Nearest neighborstructure-based scores generated by this method include StructuralConsensus and Structural Precedence as provided in the Examples. Thismethod is particularly well suited for application to evaluating thestructural fitness of immunoglobulins due to their substantial sequenceand structural homology.

In a preferred embodiment, substitutions are scored using a scoringfunction or computational design program that is substantially similarto Protein Design Automation® (PDA®) technology, as is described in U.S.Pat. No. 6,188,965; U.S. Pat. No. 6,269,312; U.S. Pat. No. 6,403,312;U.S. Pat. No. 6,708,120; U.S. Pat. No. 6,804,611; U.S. Pat. No.6,792,356; U.S. Ser. No. 09/782,004; U.S. Ser. No. 09/812,034; U.S. Ser.No. 09/927,790; U.S. Ser. No. 10/218,102; U.S. Ser. No. 10/101,499; U.S.Ser. No. 10/218,102; U.S. Ser. No. 10/666,311; U.S. Ser. No. 10/665,307;U.S. Ser. No. 10/888,748; PCT WO 98/07254; PCT WO 99/24229; PCT WO01/40091; and PCT WO 02/25588. In another preferred embodiment, acomputational design method substantially similar to Sequence PredictionAlgorithm™ (SPA™) technology is used, as is described in (Raha et al.,2000, Protein Sci. 9: 1106-1119), U.S. Ser. No. 09/877,695, and U.S.Ser. No. 10/071,859. In another preferred embodiment, the computationalmethods described in U.S. Ser. No. 10/339,788, are used.

In another preferred embodiment, optimized sequences are also assessedfor surface similarity with host antibodies. Ensuring similarity may beimportant for reducing the probability of introducing novel 3D epitopes,which are potentially recognized by B-cell receptors. In a preferredembodiment, surface similarity at position i is quantified as follows:

$\begin{matrix}{{{surfscore}(i)} = {\max\limits_{k \in {HS}}^{f_{i}^{\exp} \cdot {{(\begin{matrix}{{({\sum\limits_{j = 1}^{L}{{{proximity}{({i,j})}}*{S{({{aa}_{j}^{s},{aa}_{j}^{k}})}}}})} -} \\{({\sum\limits_{j = 1}^{L}{{{proximity}{({i,j})}}*{S{({{aa}_{j}^{s},{aa}_{j}^{s}})}}}})}\end{matrix})}/T}}}} & {{Equation}\mspace{14mu} 11}\end{matrix}$

where f_(i) ^(exp) is the fraction accessibility of position i tosolvent, proximity(i,j) is the spatial proximity of positions i and j inthe three-dimensional structure of the protein, S is a measure of aminoacid similarity, and T is a temperature factor used to tune thestringency of the similarity comparison. It will be appreciated from theequation that if sequences are identical in the region of position i, asurfscore of 1.0 will be approached. Alternatively, a score of 1.0 canalso be achieved if a position is completely buried (i.e. f_(i)^(exp)=0), since the position would not be accessible to B-cellreceptors. Lower scores represent surface positions for which there aresignificant differences between the variant sequence and the mostsimilar host sequence. In a preferred embodiment, the proximity betweentwo positions is inversely related to their distance (e.g. a Gaussian orexponential function of the distance), and the proximity of a positionto itself is 1.0. In a preferred embodiment, the decay of the proximityfunction is tuned such that patches of positions correspond to the sizeof a typical antibody epitope.

Other surface properties may also be desirable. For example, optimizedvariant sequences may be assessed for the exposure of nonpolar aminoacids, which is generally expected to decrease solubility. In suchcases, variant sequences with lower nonpolar exposure can be prioritizedover alternatives. Surface electrostatic properties may also be assessedfor variant sequences. In a preferred embodiment, surface properties areassessed for multiple variants with optimized HSC, in order to limit theset of variants that will be experimentally screened.

In another embodiment, substitution matrices or other knowledge-basedscoring methods are used to identify alternate sequences that are likelyto retain the structure and function of the protein. Such scoringmethods can be used to quantify how conservative a given substitution orset of substitutions is. In most cases, conservative mutations do notsignificantly disrupt the structure and function of proteins (see forexample, Bowie et. al., 1990, Science 247: 1306-1310, Bowie & Sauer,1989, Proc. Nat. Acad. Sci. USA 86: 2152-2156, and Reidhaar-Olson &Sauer, 1990, Proteins 7: 306-316). However, non-conservative mutationscan destabilize protein structure and reduce activity (see for example,Lim et al., 1992, Biochem. 31: 4324-4333). Substitution matricesincluding but not limited to BLOSUM62 provide a quantitative measure ofthe compatibility between a sequence and a target structure, which canbe used to predict non-disruptive substitution mutations (Topham et al.,1997, Prot. Eng. 10: 7-21). The use of substitution matrices to designpeptides with improved properties has been disclosed (Adenot et al.,1999, J. Mol. Graph. Model. 17: 292-309). Substitution matrices include,but are not limited to, the BLOSUM matrices (Henikoff & Henikoff, 1992,Proc. Nat. Acad. Sci. USA 89: 10917, the PAM matrices, the Dayhoffmatrix, and the like. For a review of substitution matrices, see forexample Henikoff, 1996, Curr. Opin. Struct. Biol. 6: 353-360. It is alsopossible to construct a substitution matrix based on an alignment of agiven protein of interest and its homologs; see for example Henikoff &Henikoff, 1996, Comput. Appl. Biosci. 12: 135-143.

In a preferred embodiment, other methods for scoring immunogenicity mayadditionally be used. Most preferably, immunogenicity may be scoredusing a function that considers peptide binding to one or more MHCmolecules. For example, substitutions would be scored such that thereare no or a minimal number of immune epitopes that are predicted tobind, with high affinity, to any prevalent MHC alleles. These methods ofscoring may be useful, for example, for designing substitutions in VHCDR3, for which scoring using human germline strings may be lessstraightforward. Several methods of identifying MHC-binding epitopes inprotein sequences are known in the art and may be used to score epitopesin an antibody. See for example WO 98/52976; WO 02/079232; WO 00/3317;U.S. Ser. No. 09/903,378; U.S. Ser. No. 10/039,170; U.S. Ser. No.60/222,697; U.S. Ser. No. 10/339,788; PCT WO 01/21823; and PCT WO02/00165; Mallios, 1999, Bioinformatics 15: 432-439; Mallios, 2001,Bioinformatics 17: 942-948; Sturniolo et al., 1999, Nature Biotech. 17:555-561; WO 98/59244; WO 02/069232; WO 02/77187; Marshall et al., 1995,J. Immunol. 154: 5927-5933; and Hammer et al., 1994, J. Exp. Med. 180:2353-2358. Sequence-based information can be used to determine a bindingscore for a given peptide-MHC interaction (see for example Mallios,1999, Bioinformatics 15: 432-439; Mallios, 2001, Bioinformatics 17:p942-948; Sturniolo et. al., 1999, Nature Biotech. 17: 555-561). It ispossible to use structure-based methods in which a given peptide iscomputationally placed in the peptide-binding groove of a given MHCmolecule and the interaction energy is determined (for example, see WO98/59244 and WO 02/069232). Such methods may be referred to as“threading” methods. Alternatively, purely experimental methods can beused; for example a set of overlapping peptides derived from the proteinof interest can be experimentally tested for the ability to induceT-cell activation and/or other aspects of an immune response. (see forexample WO 02/77187). In a preferred embodiment, MHC-binding propensityscores are calculated for each 9-residue frame along the proteinsequence using a matrix method (see Sturniolo et. al., supra; Marshallet. al., 1995, J. Immunol. 154: 5927-5933, and Hammer et. al., 1994, J.Exp. Med. 180: 2353-2358). It is also possible to consider scores foronly a subset of these residues, or to consider also the identities ofthe peptide residues before and after the 9-residue frame of interest.The matrix comprises binding scores for specific amino acids interactingwith the peptide binding pockets in different human class II MHCmolecule. In the most preferred embodiment, the scores in the matrix areobtained from experimental peptide binding studies. In an alternatepreferred embodiment, scores for a given amino acid binding to a givenpocket are extrapolated from experimentally characterized alleles toadditional alleles with identical or similar residues lining thatpocket. Matrices that are produced by extrapolation are referred to as“virtual matrices”.

In alternate embodiments, additional scoring functions are employed thatpredict reactive sites within a protein, such as deamidation sites,glycosylation sites, oxidation sites, protealytic cleavage sites, andthe like.

It will be appreciated by one of skill in the art that the use ofcombinations of any of the aforementioned scoring functions and/or otherscoring functions is contemplated. In one embodiment, this could beaccomplished by evaluating the outputs of the results from separatecalculations. Alternatively, scoring functions may be combined into onescoring term. This latter strategy enables different scoring terms to beweighted separately, thus providing more control over the relativecontributions of the scoring terms, and a greater capacity to tune thescoring function for a desired engineering strategy.

Additional Optimization of Sequences

In a preferred embodiment, after optimized protein variants have beenengineered using the aforementioned scoring functions, additionaloptimization of protein variants may be carried out. In this way, anoptimized protein variant can be thought of as primary variant ortemplate for further optimization, and variants of this primary variantcan be thought of as secondary variants. Because variant sequences ofthe invention are preferably derived from a HSC-increasing procedure inwhich substitution of structurally important positions is disallowed ordiscouraged (for example masking), it is likely that additionaloptimization of HSC is possible if those positions are allowed to varyin a secondary analysis. Optimization of other properties is possible,including but are not limited to protein affinity, expression,specificity, solubility, activity, and effector function. Thus thevariant sequences derived in the primary analysis can represent variantsfor further optimization. In a preferred embodiment, the secondaryanalysis comprises the steps of 1) string analysis of the templatesequence to identify secondary amino acid diversity that will haveneutral, positive, or minimal impact on HSC; 2) experimental productionof secondary variant sequences using the diversity derived in step 1;and 3) experimental screening of secondary variant sequences.

For these purposes, the string impact (SI) of a single substitution atposition m from amino acid x to amino acid y can be quantified as inEquation 7 (where segment length z=1). As will be appreciated, themaximum possible string increase (for a single substitution at aninternal position) is w and the maximum possible decrease is w.

In a most preferred embodiment, substitutions in a primary variant arechosen as those substitutions that will result in zero or positivestring impact, referred to herein as string neutral and string positivesubstitutions respectively. In other embodiments, substitutions thatresult in negative string impact, i.e. string negative substitutions,may also be considered for engineering secondary variants. Secondaryvariants then may be constructed, expressed, and tested experimentally.Secondary substitutions that show favorable properties with respect toantigen affinity, effector function, stability, solubility, expression,and the like, may be combined in subsequent variants to generate a moreoptimized therapeutic candidate.

Optimization of Non-Xenogeneic Proteins

It can be appreciated that the optimization described above is notrestricted to immunogenicity reduction of xenogeneic proteins.Evaluation of potential substitutions for string impact provides anexcellent strategy for generating substitution diversity for engineeringprotein variants with optimized properties. A clear advantage of thisapproach is that it generates protein variants with minimalimmunogenicity risk, the importance of which has been discussedextensively and is a primary goal of the present invention. Anadditional advantage of this approach is that because the sequencesbeing used to evaluate string impact are typically derived from a set ofnaturally evolved host sequences, variants designed are effectivelyenriched for stability, solubility, and other favorable properties. Theutility of this capability lies in the fact that there are innumerableamino acid modifications that are detrimental or deleterious toproteins. By screening a quality set of variant diversity, the chancesare increased that a protein variant of the desired property will beobtained. The capacity of the string impact approach to generate aquality set of variant diversity derives from the greater tolerance tomutation of positions which sample greater diversity, and the greaterpropensity of amino acids in a set of naturally evolved sequences to becompatible with a homologous protein's structure, stability, solubility,function, and the like.

This string impact approach to variant design may be applied not only tothe generation of secondary variants as described above, but may also beused to engineer amino acid modifications in proteins that arepresumably already minimally immunogenic. This may include, for example,natural host proteins. Alternatively, and in a preferred embodiment, thestring impact strategy may be applied to engineer modifications in anantibody variable region (VH or VL) that is humanized (Clark, 2000,Immunol Today 21:397-402), or “fully human” as obtained for exampleusing transgenic mice (Bruggemann et al., 1997, Curr Opin Biotechnol8:455-458) or human antibody libraries coupled with selection methods(Griffiths et al., 1998, Curr Opin Biotechnol 9:102-108). As withoptimization of primary variant sequences described above, the stringimpact analysis described here can be used to identify secondarydiversity that will have neutral, positive, or minimal impact on HSC, aswell as potentially other favorable properties. Such diversity can thenbe used to screen for optimized versions of these sequences withoutincreasing the risk of immunogenicity.

Experimental Production, Screening, and Testing

Methods for production and screening of protein variants are well knownin the art. General methods for antibody molecular biology, expression,purification, and screening are described in Antibody Engineering,edited by Duebel & Kontermann, Springer-Verlag, Heidelberg, 2001; andHayhurst & Georgiou, 2001, Curr Opin Chem Biol 5:683-689; Maynard &Georgiou, 2000, Annu Rev Biomed Eng 2:339-76. Also see the methodsdescribed in U.S. Ser. No. 10/339,788, filed on Mar. 3, 2003, U.S. Ser.No. 10/672,280, filed Sep. 29, 2003, and U.S. Ser. No. 10/822,231, filedMar. 26, 2004.

In one embodiment of the present invention, the library sequences areused to create nucleic acids that encode the member sequences, and thatmay then be cloned into host cells, expressed and assayed, if desired.These practices are carried out using well-known procedures, and avariety of methods that may find use in the present invention aredescribed in Molecular Cloning—A Laboratory Manual, 3^(rd) Ed.(Maniatis, Cold Spring Harbor Laboratory Press, New York, 2001), andCurrent Protocols in Molecular Biology (John Wiley & Sons). The nucleicacids that encode the protein variants of the present invention may beincorporated into an expression vector in order to express the protein.Expression vectors typically comprise a protein operably linked, that isplaced in a functional relationship, with control or regulatorysequences, selectable markers, any fusion partners, and/or additionalelements. The protein variants of the present invention may be producedby culturing a host cell transformed with nucleic acid, preferably anexpression vector, containing nucleic acid encoding the proteinvariants, under the appropriate conditions to induce or cause expressionof the protein. A wide variety of appropriate host cells may be used,including but not limited to mammalian cells, bacteria, insect cells,and yeast. For example, a variety of cell lines that may find use in thepresent invention are described in the ATCC cell line catalog, availablefrom the American Type Culture Collection. The methods of introducingexogenous nucleic acid into host cells are well known in the art, andwill vary with the host cell used.

In a preferred embodiment, protein variants are purified or isolatedafter expression. Proteins may be isolated or purified in a variety ofways known to those skilled in the art. Standard purification methodsinclude chromatographic techniques, electrophoretic, immunological,precipitation, dialysis, filtration, concentration, and chromatofocusingtechniques. As is well known in the art, a variety of natural proteinsbind antibodies, for example bacterial proteins A, G, and L, and theseproteins may find use in the present invention for purification.Purification can often be enabled by a particular fusion partner. Forexample, proteins may be purified using glutathione resin if a GSTfusion is employed, Ni⁺² affinity chromatography if a His-tag isemployed, or immobilized anti-flag antibody if a flag-tag is used. Forgeneral guidance in suitable purification techniques, see ProteinPurification: Principles and Practice, 3r^(d) Ed., Scopes,Springer-Verlag, NY, 1994.

Protein variants may be screened using a variety of methods, includingbut not limited to those that use in vitro assays, in vivo andcell-based assays, and selection technologies. Automation andhigh-throughput screening technologies may be utilized in the screeningprocedures. Screening may employ the use of a fusion partner or label,for example an immune label, isotopic label, or small molecule labelsuch as a fluorescent or colorimetric dye.

In a preferred embodiment, the functional and/or biophysical propertiesof protein variants are screened in an in vitro assay. In a preferredembodiment, the protein is screened for functionality, for example itsability to catalyze a reaction or its binding affinity to its target.Binding assays can be carried out using a variety of methods known inthe art, including but not limited to FRET (Fluorescence ResonanceEnergy Transfer) and BRET (Bioluminescence Resonance EnergyTransfer)-based assays, AlphaScreen™ (Amplified Luminescent ProximityHomogeneous Assay), Scintillation Proximity Assay, ELISA (Enzyme-LinkedImmunosorbent Assay), SPR (Surface Plasmon Resonance, also known asBIACORE®), isothermal titration calorimetry, differential scanningcalorimetry, gel electrophoresis, and chromatography including gelfiltration. These and other methods may take advantage of some fusionpartner or label. Assays may employ a variety of detection methodsincluding but not limited to chromogenic, fluorescent, luminescent, orisotopic labels. The biophysical properties of proteins, for examplestability and solubility, may be screened using a variety of methodsknown in the art. Protein stability may be determined by measuring thethermodynamic equilibrium between folded and unfolded states. Forexample, protein variants of the present invention may be unfolded usingchemical denaturant, heat, or pH, and this transition may be monitoredusing methods including but not limited to circular dichroismspectroscopy, fluorescence spectroscopy, absorbance spectroscopy, NMRspectroscopy, calorimetry, and proteolysis. As will be appreciated bythose skilled in the art, the kinetic parameters of the folding andunfolding transitions may also be monitored using these and othertechniques. The solubility and overall structural integrity of a proteinvariant may be quantitatively or qualitatively determined using a widerange of methods that are known in the art. Methods which may find usein the present invention for characterizing the biophysical propertiesof protein variants include gel electrophoresis, chromatography such assize exclusion chromatography and reversed-phase high performance liquidchromatography, mass spectrometry, ultraviolet absorbance spectroscopy,fluorescence spectroscopy, circular dichroism spectroscopy, isothermaltitration calorimetry, differential scanning calorimetry, analyticalultra-centrifugation, dynamic light scattering, proteolysis, andcross-linking, turbidity measurement, filter retardation assays,immunological assays, fluorescent dye binding assays, protein-stainingassays, microscopy, and detection of aggregates via ELISA or otherbinding assay. Structural analysis employing X-ray crystallographictechniques and NMR spectroscopy may also find use.

In a preferred embodiment, protein variants are screened using one ormore cell-based or in vivo assays. For such assays, purified orunpurified proteins are typically added exogenously such that cells areexposed to individual variants or pools of variants belonging to alibrary. These assays are typically, but not always, based on thefunction of the protein; that is, the ability of the protein to bind toits target and mediate some biochemical event, for example effectorfunction, ligand/receptor binding inhibition, apoptosis, and the like.Such assays often involve monitoring the response of cells to theprotein, for example cell survival, cell death, change in cellularmorphology, or transcriptional activation such as cellular expression ofa natural gene or reporter gene. For example, such assays may measurethe ability of antibody variants to elicit ADCC, ADCP, or CDC. For someassays additional cells or components, that is in addition to the targetcells, may need to be added, for example serum complement, or effectorcells such as peripheral blood monocytes (PBMCs), NK cells, macrophages,and the like. Such additional cells may be from any organism, preferablyhumans, mice, rat, rabbit, and monkey. Proteins may cause apoptosis ofcertain cell lines expressing the target, or they may mediate attack ontarget cells by immune cells which have been added to the assay. Methodsfor monitoring cell death or viability are known in the art, and includethe use of dyes, immunochemical, cytochemical, and radioactive reagents.For example, caspase staining assays may enable apoptosis to bemeasured, and uptake or release of radioactive substrates or fluorescentdyes such as alamar blue may enable cell growth or activation to bemonitored. In a preferred embodiment, the DELFIA® EuTDA-basedcytotoxicity assay (Perkin Elmer, Mass.) is used. Alternatively, dead ordamaged target cells may be monitored by measuring the release of one ormore natural intracellular proteins, for example lactate dehydrogenase.Transcriptional activation may also serve as a method for assayingfunction in cell-based assays. In this case, response may be monitoredby assaying for natural genes or proteins which may be upregulated, forexample the release of certain interleukins may be measured, oralternatively readout may be via a reporter construct. Cell-based assaysmay also involve the measure of morphological changes of cells as aresponse to the presence of a protein. Cell types for such assays may beprokaryotic or eukaryotic, and a variety of cell lines that are known inthe art may be employed. Alternatively, cell-based screens are performedusing cells that have been transformed or transfected with nucleic acidsencoding the variant proteins. That is, protein variants are not addedexogenously to the cells. For example, in one embodiment, the cell-basedscreen utilizes cell surface display. A fusion partner can be employedthat enables display of variants on the surface of cells (Witrrup, 2001,Curr Opin Biotechnol, 12:395-399).

As is known in the art, a subset of screening methods are those thatselect for favorable members of a library. The methods are hereinreferred to as “selection methods”, and these methods find use in thepresent invention for screening protein variants. When protein librariesare screened using a selection method, only those members of a librarythat are favorable, that is which meet some selection criteria, arepropagated, isolated, and/or observed. As will be appreciated, becauseonly the most fit variants are observed, such methods enable thescreening of libraries that are larger than those screenable by methodsthat assay the fitness of library members individually. Selection isenabled by any method, technique, or fusion partner that links,covalently or noncovalently, the phenotype of a protein with itsgenotype, i.e., the function of a protein with the nucleic acid thatencodes it. For example the use of phage display as a selection methodis enabled by the fusion of library members to the gene III protein. Inthis way, selection or isolation of protein variants that meet somecriteria, for example binding affinity to the protein's target, alsoselects for or isolates the nucleic acid that encodes it. Once isolated,the gene or genes encoding variants may then be amplified. This processof isolation and amplification, referred to as panning, may be repeated,allowing favorable protein variants in the library to be enriched.Nucleic acid sequencing of the attached nucleic acid ultimately allowsfor gene identification.

A variety of selection methods are known in the art that may find use inthe present invention for screening protein libraries. These include butare not limited to phage display (Phage display of peptides andproteins: a laboratory manual, Kay et al., 1996, Academic Press, SanDiego, Calif., 1996; Lowman et al., 1991, Biochemistry 30:10832-10838;Smith, 1985, Science 228:1315-1317) and its derivatives such asselective phage infection (Malmborg et al., 1997, J Mol Biol273:544-551), selectively infective phage (Krebber et al., 1997, J MolBiol 268:619-630), and delayed infectivity panning (Benhar et al., 2000,J Mol Biol 301:893-904), cell surface display (Witrrup, 2001, Curr OpinBiotechnol, 12:395-399) such as display on bacteria (Georgiou et al.,1997, Nat Biotechnol 15:29-34; Georgiou et al., 1993, Trends Biotechnol11:6-10; Lee et al., 2000, Nat Biotechnol 18:645-648; Jun et al., 1998,Nat Biotechnol 16:576-80), yeast (Boder & Wittrup, 2000, Methods Enzymol328:430-44; Boder & Wittrup, 1997, Nat Biotechnol 15:553-557), andmammalian cells (Whitehorn et al., 1995, Bio/technology 13:1215-1219),as well as in vitro display technologies (Amstutz et al., 2001, CurrOpin Biotechnol 12:400-405) such as polysome display (Mattheakis et al.,1994, Proc Natl Acad Sci USA 91:9022-9026), ribosome display (Hanes etal., 1997, Proc Natl Acad Sci USA 94:4937-4942), mRNA display (Roberts &Szostak, 1997, Proc Natl Acad Sci USA 94:12297-12302; Nemoto et al.,1997, FEBS Lett 414:405-408), and ribosome-inactivation display system(Zhou et al., 2002, J Am Chem Soc 124, 538-543).

Other selection methods that may find use in the present inventioninclude methods that do not rely on display, such as in vivo methodsincluding but not limited to periplasmic expression and cytometricscreening (Chen et al., 2001, Nat Biotechnol 19:537-542), the proteinfragment complementation assay (Johnsson & Varshaysky, 1994, Proc NatlAcad Sci USA 91:10340-10344; Pelletier et al., 1998, Proc Natl Acad SciUSA 95:12141-12146), and the yeast two hybrid screen (Fields & Song,1989, Nature 340:245-246) used in selection mode (Visintin et al., 1999,Proc Natl Acad Sci USA 96:11723-11728). In an alternate embodiment,selection is enabled by a fusion partner that binds to a specificsequence on the expression vector, thus linking covalently ornoncovalently the fusion partner and associated variant library memberwith the nucleic acid that encodes them. For example, U.S. Ser. No.09/642,574; U.S. Ser. No. 10/080,376; U.S. Ser. No. 09/792,630; U.S.Ser. No. 10/023,208; U.S. Ser. No. 09/792,626; U.S. Ser. No. 10/082,671;U.S. Ser. No. 09/953,351; U.S. Ser. No. 10/097,100; U.S. Ser. No.60/366,658; PCT WO 00/22906; PCT WO 01/49058; PCT WO 02/04852; PCT WO02/04853; PCT WO 02/08023; PCT WO 01/28702; and PCT WO 02/07466 describesuch a fusion partner and technique that may find use in the presentinvention. In an alternative embodiment, in vivo selection can occur ifexpression of the protein imparts some growth, reproduction, or survivaladvantage to the cell.

A subset of selection methods referred to as “directed evolution”methods are those that include the mating or breading of favorablesequences during selection, sometimes with the incorporation of newmutations. As will be appreciated by those skilled in the art, directedevolution methods can facilitate identification of the most favorablesequences in a library, and can increase the diversity of sequences thatare screened. A variety of directed evolution methods are known in theart that may find use in the present invention for screening proteinvariants, including but not limited to DNA shuffling (PCT WO 00/42561A3; PCT WO 01/70947 A3), exon shuffling (U.S. Pat. No. 6,365,377;Kolkman & Stemmer, 2001, Nat Biotechnol 19:423-428), family shuffling(Crameri et al., 1998, Nature 391:288-291; U.S. Pat. No. 6,376,246),RACHIT™ (Coco et al., 2001, Nat Biotechnol 19:354-359; PCT WO 02/06469),STEP and random priming of in vitro recombination (Zhao et al., 1998,Nat Biotechnol 16:258-261; Shao et al., 1998, Nucleic Acids Res26:681-683), exonuclease mediated gene assembly (U.S. Pat. No.6,352,842; U.S. Pat. No. 6,361,974), Gene Site Saturation Mutagenesis™(U.S. Pat. No. 6,358,709), Gene Reassembly™ (U.S. Pat. No. 6,358,709),SCRATCHY (Lutz et al., 2001, Proc Natl Acad Sci USA 98:11248-11253), DNAfragmentation methods (Kikuchi et al., Gene 236:159-167),single-stranded DNA shuffling (Kikuchi et al., 2000, Gene 243:133-137),and AMEsystem™ directed evolution protein engineering technology(Applied Molecular Evolution) (U.S. Pat. No. 5,824,514; U.S. Pat. No.5,817,483; U.S. Pat. No. 5,814,476; U.S. Pat. No. 5,763,192; U.S. Pat.No. 5,723,323).

In a preferred embodiment, the immunogenicity of the protein variants isdetermined experimentally to confirm that the variants do have reducedor eliminated immunogenicity relative to the parent protein. Severalmethods can be used for experimental confirmation of epitopes. In apreferred embodiment, ex vivo T-cell activation assays are used toexperimentally quantitate immunogenicity. In this method, antigenpresenting cells and naïve T cells from matched donors are challengedwith a peptide or whole protein of interest one or more times. Then, Tcell activation can be detected using a number of methods, for exampleby monitoring production of cytokines or measuring uptake of tritiatedthymidine. In the most preferred embodiment, interferon gamma productionis monitored using Elispot assays (Schmittel et. al., 2000, J. Immunol.Meth., 24: 17-24). If sera are available from patients who have raisedan immune response to protein, it is possible to detect mature T cellsthat respond to specific epitopes. In a preferred embodiment, interferongamma or IL-5 production by activated T-cells is monitored using Elispotassays, although it is also possible to use other indicators of T cellactivation or proliferation such as tritiated thymidine incorporation orproduction of other cytokines. Other suitable T cell assays includethose disclosed in Meidenbauer et al., 2000, Prostate 43, 88-100;Schultes & Whiteside, 2003, J. Immunol. Methods 279, 1-15; and Stickleret al., 200, J. Immunotherapy, 23, 654-660. In a preferred embodiment,the PBMC donors used for the above-described T cell activation assayswill comprise class II MHC alleles that are common in patients requiringtreatment for protein responsive disorders. For example, for mostdiseases and disorders, it is desirable to test donors comprising all ofthe alleles that are prevalent in the population. However, for diseasesor disorders that are linked with specific MHC alleles, it may be moreappropriate to focus screening on alleles that confer susceptibility toprotein responsive disorders. In a preferred embodiment, the MHChaplotype of PBMC donors or patients that raise an immune response tothe wild type or protein variant are compared with the MHC haplotype ofpatients who do not raise a response. This data may be used to guidepreclinical and clinical studies as well as aiding in identification ofpatients who will be especially likely to respond favorably orunfavorably to the protein therapeutic.

In an alternate preferred embodiment, immunogenicity is measured intransgenic mouse systems. For example, mice expressing fully orpartially human class II MHC molecules may be used. In an alternateembodiment, immunogenicity is tested by administering the proteinvariants to one or more animals, including rodents and primates, andmonitoring for antibody formation. Nonhuman primates with defined MHChaplotypes may be especially useful, as the sequences and hence peptidebinding specificities of the MHC molecules in nonhuman primates may bevery similar to the sequences and peptide binding specificities ofhumans. Similarly, genetically engineered mouse models expressing humanMHC peptide-binding domains may be used (see for example Sonderstrup etal., 1999, Immunol. Rev. 172: 335-343; and Forsthuber et al., 2001, J.Immunol. 167: 119-125).

The biological properties of the proteins of the present invention maybe characterized in cell, tissue, and whole organism experiments. As isknown in the art, drugs are often tested in animals, including but notlimited to mice, rats, rabbits, dogs, cats, pigs, and monkeys, in orderto measure a drug's efficacy for treatment against a disease or diseasemodel, or to measure a drug's pharmacokinetics, toxicity, and otherproperties. The animals may be referred to as disease models.Therapeutics are often tested in mice, including but not limited to nudemice, SCID mice, xenograft mice, and transgenic mice (including knockinsand knockouts). Such experimentation may provide meaningful data fordetermination of the potential of the protein to be used as atherapeutic. Any organism, preferably mammals, may be used for testing.For example because of their genetic similarity to humans, monkeys canbe suitable therapeutic models, and thus may be used to test theefficacy, toxicity, pharmacokinetics, or other property of the proteinsof the present invention. Tests of the in humans are ultimately requiredfor approval as drugs, and thus of course these experiments arecontemplated. Thus the proteins of the present invention may be testedin humans to determine their therapeutic efficacy, toxicity,immunogenicity, pharmacokinetics, and/or other clinical properties.

In one embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes two or more amino acidsubstitutions derived from two or more natural antibodies. In thisembodiment, a first resultant variant string in the variant antibody isrendered most homologous to a first natural antibody, a second resultantvariant string in the variant antibody is rendered most homologous tothe corresponding string in an second natural antibody, thesubstitutions are not in a CDR, and at least one resultant string is nota consensus of homologous natural sequences. In a preferred embodiment,the variant strings in the variant antibody do not include CDR residues.In a further preferred embodiment, the first and second naturalantibodies are from different subfamilies.

In another embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes two or more amino acidsubstitutions derived from three or more natural antibodies. In thisembodiment, a first resultant variant string in the variant antibody isrendered most homologous to a first natural antibody, a second resultantvariant string in the variant antibody is rendered most homologous tothe corresponding string in an second natural antibody, a thirdresultant variant string in the variant antibody is rendered mosthomologous to the corresponding string in an third natural antibody, thesubstitutions are not in a CDR, and at least one resultant string is nota consensus of homologous natural sequences. In an additionalembodiment, the first, second and third natural antibodies are fromdifferent subfamilies. If a further additional embodiment, the variantantibody further comprises a fourth resultant variant string that isrendered most homologous to the corresponding string in a fourth naturalantibody. If an additional embodiment, the variant antibody includes atleast one substitution that is made at a position that is not surfaceexposed, the first, second and third natural antibodies are fromdifferent antibody groups, one of the substitutions is made at aposition that is part of the VH/VL interface, and at least one aminoacid substitution is not a back mutation.

In other embodiments, the first, second and third natural antibodies arefrom different antibody groups.

In other embodiments, the variant antibody includes at least onesubstitution that is not surface exposed.

In other embodiments, at least one of the substitutions is made at aposition that is part of the VH/VL interface.

In other embodiments, the variant antibody includes at least one aminoacid substitution is not a back mutation.

In one embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes a variant VH antibodyregion with host string content (HSC) greater than about 75%, and aframework region homogeneity (FRH) less than about 60%, wherein the HSCand FRH are calculated with a window size of 9.

In one embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes a variant VH antibodyregion with exact string content greater than about 20%; and a frameworkregion homogeneity less than about 60%, wherein the HSC and FRH arecalculated with a window size of 9.

In one embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes a variant VL antibodyregion with exact string content greater than about 35%, and a frameworkregion homogeneity less than about 60%, wherein HSC and FRH arecalculated with a window size of 9.

In one embodiment of the present invention, a variant antibody for ahost as compared to a parent antibody includes a first set of one ormore amino acid substitutions from a first natural antibody and a secondset of one or more amino acid substitutions from a second naturalantibody, wherein the identity of said substituted amino acids from saidsecond antibody differ from the corresponding amino acids of said firstnatural antibody, the substitutions are not in a CDR, and at least onesubstitution is not a consensus of homologous natural sequences. In anadditional embodiment, the variant antibody further includes a third setof one or more amino acids substitutions from a third natural antibodywherein the identity of the substituted amino acids of said third setdiffer from the identity the corresponding amino acids from said firstand second sets of amino acid substitutions. In a further embodiment,the variant antibody includes a fourth set of one or more amino acidsfrom a fourth natural antibody wherein the identity of the substitutedamino acids of said fourth set differ from the identity thecorresponding amino acids from said first, second and third sets ofamino acid substitutions. In other embodiments, the variant antibodyincludes multiple sets of one or more amino acids from multiple naturalantibodies, wherein the identity of the substituted amino acids of anyset differ from the identity the corresponding amino acids from theother sets of amino acid substitutions.

EXAMPLES

Examples are provided below to illustrate the present invention. Theseexamples are not meant to constrain the present invention to anyparticular application or theory of operation.

For reference to immunoglobulin variable regions, positions are numberedaccording to the Kabat numbering scheme. For reference to immunoglobulinconstant regions, positions are numbered according to the EU index as inKabat (Kabat et al., 1991, Sequences of Proteins of ImmunologicalInterest, 5th Ed., United States Public Health Service, NationalInstitutes of Health, Bethesda).

Example 1 Immunogenicity Reduction of AC10

To illustrate application of the method described in the presentinvention, and to validate its broad applicability to immunogenicityreduction of proteins, a xenogeneic antibody example is provided usingas the parent sequence the anti-CD30 antibody AC10 (Bowen et al. Journalof Immunology, 1993, 151: 5896). A structural model of the mouse AC10variable region was constructed using standard antibody modeling methodsknown in the art. FIGS. 3 and 4 show the sequences (SEQ ID NOS: 118 and1190, host string content, and structures of the AC10 VL and VH domains(referred to as L0 AC10 VL and H0 AC10 VH respectively). A CDR graft ofthis antibody was constructed by placing the AC10 CDRs into the contextof the frameworks of the most homologous host germlines, determined tobe vlk_(—)4-1 for VL and vh_(—)1-3 for VH using the sequence alignmentprogram BLAST. The sequences (SEQ ID NOS: 120 and 121) and stringcontent of these CDR grafts are shown in FIGS. 5 and 6, along withstructures of modeled AC10 highlighting the mutational differencesbetween the CDR grafted AC10 variable chains and WT.

AC10 variants with reduced immunogenicity were generated by applying astring optimization algorithm on the WT AC10 VL and VH sequences. Thisalgorithm heuristically samples multiple amino acid mutations that existin the diversity of the human VLκ and VH germline sequences, andcalculates the host string content (HSC) of each sequence according toEquation 3 described above, using a window size w=9. In this set ofcalculations, residues in the CDRs and close to a CDR or to the VL/VHinterface were masked, that is were not allowed to mutate. CDRs weredefined as a slightly smaller set of residues than the CDRs defined byChothia (Chothia & Lesk, 1987, J. Mol. Biol. 196: 901-917; Chothia etal., 1989, Nature 342: 877-883; Al-Lazikani et al., 1997, J. Mol. Biol.273: 927-948). For the purposes of the present invention, VL CDRs areherein defined to include residues at positions 27-32 (CDR1), 50-56(CDR2), and 91-97 (CDR3), wherein the numbering is according to Chothia.Because the VL CDRs as defined by Chothia and Kabat are identical, thenumbering of these VL CDR positions is also according to Kabat. For thepurposes of the present invention, VH CDRs are herein defined to includeresidues at positions 27-33 (CDR1), 52-56 (CDR2), and 95-102 (CDR3),wherein the numbering is according to Chothia. These VH CDR positionscorrespond to Kabat positions 27-35 (CDR1), 52-56 (CDR2), and 95-102(CDR3). Masked residues in these calculations were set at positions 1-4,25-34, 36, 38, 43, 44, 46, 48-58, 60, 63-69, 71, 87, and 89-98 for VL,and 2, 4, 24, 26-35, 37, 39, 44, 45, 47, 50-58, 60, 61, 71, 73, 76, 78,91, and 93-106 for VH, wherein the numbering is according to Kabat.Masking of potentially critical residues is a conservative approach togenerating more host antibody variants, however it is but one embodimentof the present invention, and calculations wherein positions are notmasked are also contemplated. This calculation was run for AC10 VL andVH in 100 separate interactions, generating a set of diverse AC10variants with more host string content than WT. FIG. 7 shows theclustered nonredundant set of output sequences from these calculationsfor the AC10 VL and VH region (SEQ ID NOS:122-217), referred to as AC10VL HSC Calculation 1 and AC10 VH HSC Calculation 1 respecitvely. Foreach iteration (Iter), the HSC (Equation 3), HSS (Equation 5), andnumber (Mut) and identity (shaded residues) of mutations from WT arepresented. In addition to the HSC score, each sequence was evaluated forits structural and functional integrity using a nearest neighborstructure-based scoring method (U.S. Ser. No. 60/528,229, filed Dec. 8,2003, entitled Protein Engineering with Analogous Contact Environments).Two measures of structural fitness, referred to as “StructuralConsensus” and “Structural Precedence”, are also provided in the FIG. 7.Although the Analogous Contact Environments method is particularlywell-suited for antibodies because of the wealth of sequence andstructure information, any structure-based and/or sequence-based scoringmethod may be used to evaluate the structural and functional fitness ofthe variant sequences. The output sequences were clustered based ontheir mutational distance from the other sequences in the set, and theseclusters are delineated by the horizontal black lines in the Figure. The“Cluster” column provides the quantitative mutational distance betweeneach sequence and the rest of the sequence in its cluster; sequenceswith a lower cluster value are more representative of that particularsequence cluster.

These calculations were used to generate a set of AC10 VL and VHvariants. In some cases, further substitutions were made to outputsequences, using string and structural scores, as well as visualinspection of the modeled AC10 structure, to evaluate fitness. FIGS.8-13 present the sequences (SEQ ID NOS: 218-223), host string content,and mapped mutational differences on the modeled AC10 structure for eachof the AC10 VL and VH variants. Iteration 36 from AC10 VL HSCcalculation 1 served as the precursor for L1 AC10 VL, iteration 37served as the precursor for L2 AC10 VL, and iteration 3 served as theprecursor for L3 AC10 VL. Iteration 15 from AC10 VH HSC calculation 1served as the precursor for H1 AC10 HL, iteration 55 served as theprecursor for H2 AC10 VH, and iteration 18 served as the precursor forH3 AC10 VH.

Tables 1 and 2 present the number of mutations from the parent sequence,structural fitness scores, and host string scores for the AC10 VL and VHvariants as compared to the WT and CDR grafted AC10 sequences. Inaddition to the aforementioned structural and host string analysis, eachsequence was analyzed for its global homology to the host germline. Themaximum identity match to the germline for each string in the sequenceswas also determined, referred to as N_(o)max. This represents the totalnumber of strings in each sequence whose maximum identity to thecorresponding strings in the host germline is the indicated value. Forw=9, Tables 1 and 2 list N₉max, N₈max, N₇max, and N_(≦6)max for eachsequence. N₉max represents the number of strings in the sequence forwhich 9 of 9 residues match at least one string in the host germline,N₈max represents the number of strings for which 8 of 9 residues matchat least one string in the host germline, N₇max represents the number ofstrings for which 7 of 9 residues match at least one string in the hostgermline, and N_(≦6)max represents the number of strings for which 6 orless residues of 9 residues match at least one string in the hostgermline. This last category (ID≦6) could, for example, be regarded asthe number of poorly scoring strings. In addition to the aforementionedstructural and host string analysis, each sequence was analyzed for itsglobal homology to the host germline; Tables 1 and 2 present the mosthomologous human germline sequence for each sequence (Closest Germline)and corresponding identity to that germline (ID to Closest Germline),determined using the sequence alignment program BLAST. Finally, theFramework region homogeneity (FRH) of each variant was evaluated forw=9, and is presented in Tables 1 and 2, providing the extent to whichthe host string content of each variant is derived from a singlegermline as opposed to multiple germline sequences.

TABLE 1 AC10 VL Variants CDR WT Graft L1 L2 L3 Mutations 18 15 16 9Structural Consensus 0.57 0.57 0.59 0.64 0.56 Structural Precedence 0.680.57 0.66 0.67 0.58 Human String Content 0.78 0.88 0.86 0.86 0.85 HumanString 0.15 0.57 0.48 0.46 0.41 Similarity Framework Region 0.60 0.970.73 0.81 0.52 Homogeneity N₉max 15 61 51 48 42 N₈max 27 11 13 15 21N₇max 31 15 21 22 20 N_(≦6)max 34 20 22 22 24 Closest Germline 4-1 4-13-11 1-39 4-1 ID to Closest 68/101 86/101 78/99 80/99 75/101 Germline67% 85% 79% 81% 74%

TABLE 2 AC10 VH Variants CDR WT Graft H1 H2 H3 Mutations 26 16 23 20Structural Consensus 0.49 0.48 0.48 0.50 0.47 Structural Precedence 0.630.67 0.63 0.59 0.59 Human String Content 0.69 0.87 0.81 0.81 0.80 HumanString 0.07 0.68 0.41 0.39 0.38 Similarity Framework Region 0.60 0.860.65 0.47 0.55 Homogeneity N₉max 5 81 48 45 44 N₈max 31 13 30 32 28N₇max 34 8 20 21 24 N_(≦6)max 49 17 21 21 23 Closest Germline 1-3 1-31-3 1-3 7-4-1 ID to Closest 69/98 93/98 83/98 72/98 76/98 Germline 70%95% 85% 73% 78%

An important observation is that, whereas the CDR grafted antibodies aremost homologous to a single human germline sequence (the “acceptor”sequence in humanization terminology), the present invention describesvariants that are homologous to different host germline sequences indifferent regions of the sequence. This is evident from the significantdifferences in Framework region homogeneity (FRH) scores for the AC10variants of the present invention and CDR grafted AC10 variants.Furthermore, whereas CDR grafted AC10 VL and VH are most homologous tohuman germline subfamilies 4 (VL) and 1 (VH) respectively across theirentire sequences, a number of the AC10 variants are most homologous todifferent subfamilies in different frameworks. Additionally, whereas theCDR grafted antibodies are most homologous to a single germline sequencethat is also the most homologous sequence to the parent sequence, thepresent invention presents a set of antibodies for a given antibody thatare most homologous to different human germline sequences, which neednot be the most homologous germline sequence to WT. For example, Table 1shows that CDR grafted AC10 VL is most homologous to 4-1, which is alsothe most homologous human germline to the WT AC10 parent. However L1,L2, and L3 are most homologous to three different human germlines—3-11,1-39, and 4-1 respectively. Thus the variants of the present inventionexplore a substantially greater amount of diversity than CDR graftedantibodies. One obvious advantage of this is that the method of thepresent invention provides a greater chance of success with respect toantigen affinity. The choice of an “acceptor” in humanization methodsplaces a single bet; if the donor CDRs are in fact incompatible with theacceptor FRs, a set of backmutations that regain WT affinity may notexist. In contrast, the method of the present invention enables agreater diversity of sequence and structure space to be sampled in theimmunogenicity reduction process, increasing the chances of obtaining afinal less immunogenic version with WT affinity or better. An additionaladvantage of sampling greater sequence diversity is that some sequencesmay have more optimal properties than others, for example with regard tostability, solubility, and effector function. For example, as disclosedin U.S. Ser. No. 60/614,944, and U.S. Ser. No. 60/619,409, filed Oct.14, 2004, entitled “Immunoglobulin Variants Outside the Fc Region withOptimized Effector Function”, the variable region of an antibody mayimpact effector functions such as antibody dependent cell-mediatedcytotoxicity (ADCC), antibody dependent cell-mediated phagocytosis(ADCP), and complement dependent cytotoxicity (CDC).

The genes for the variable regions of AC10 WT (L0 and H0) and variants(L1, L2, L3, H1, H2, and H3) were constructed using recursive PCR, andsubcloned into a the mammalian expression vector pcDNA3.1Zeo(Invitrogen) comprising the full length light kappa (CDκ) and heavychain IgG1 constant regions. All sequences were sequenced to confirm thefidelity of the sequence. Plasmids containing heavy chain gene(VH-CH1-CH2-CH3) (wild-type or variants) were co-transfected withplasmid containing light chain gene (VL-CLκ) in all combinations (L0/H0,L0/H1, L0/H2, L0/H3, L1/H0, L1/H1, L1/H2, L1/H3, L2/H0, L2/H1, L2/H2,L2/H3, L3/H0, L3/H1, L3/H2, L3/H3) into 293T cells. Here, for example,L2/H3 refers to the L2 AC10 VL paired with H3 AC10 VH. Media wereharvested 5 days after transfection, and antibodies were purified fromthe supernatant using protein A affinity chromatography (Pierce, Catalog#20334).

WT and variant antibodies were experimentally tested for their capacityto bind CD30 antigen. Binding affinity to human CD30 by the AC10 WT andvariant antibodies was measured using a quantitative and extremelysensitive method, AlphaScreen™ assay. The AlphaScreen™ assay is abead-based non-radioactive luminescent proximity assay. Laser excitationof a donor bead excites oxygen, which if sufficiently close to theacceptor bead will generate a cascade of chemiluminescent events,ultimately leading to fluorescence emission at 520-620 nm. TheAlphaScreen™ assay was applied as a competition assay for screening theantibodies. WT AC10 antibody was biotinylated by standard methods forattachment to streptavidin donor beads (Perkin Elmer). Commericial CD30was conjugated to digoxigenin (DIG) (Roche Diagnostics) for attachmentto anti-DIG acceptor beads (Perkin Elmer). In the absence of competingAC10 variants, WT antibody and CD30 interact and produce a signal at520-620 nm. Addition of untagged AC10 variant competes with the WTAC10/CD30 interaction, reducing fluorescence quantitatively to enabledetermination of relative binding affinities. FIGS. 14 a and 14 b showbinding of WT (H0L0) and AC10 variant antibodies to CD30 using theAlphaScreen™ assay. The data were fit to a one site competition modelusing nonlinear regression, and these fits are represented by the curvesin the figure. These fits provide the inhibitory concentration 50%(IC50) (i.e. the concentration required for 50% inhibition) for eachantibody, thus enabling the relative binding affinities relative to WTto be determined. Table 3 provides the IC50's and Fold IC50's relativeto WT for fits to these binding curves. The AC10 variants display anarray of CD30 binding affinities, with a number of variants binding CD30with affinity comparable to or better affinity than WT AC10.

Antigen affinity of the AC10 variants was also measured using SurfacePlasmon Resonance (SPR) (Biacore, Uppsala, Sweden). SPR allows for themeasurement of direct binding rates and affinities of protein-proteininteractions, and thus provides an excellent complementary binding assayto the AlphaScreen™ assay. CD30 fused to the Fc region of IgG1 (R&DSystems) was immobilized on a Protein A SPR chip, the surface wasblocked with Fc, and WT and variant AC10 antibodies were flowed over thechip at a range of concentrations. The resulting sensorgrams are shownin FIG. 15. Global Langmuir fits were carried out for the concentrationsseries using the BiaEvaluation curve fitting software, providing theon-rate constant (ka), off-rate constant (kd), and equilibrium bindingconstant (KD=kd/ka) for the curves. Table 3 provides the KDs and FoldKDs relative to WT for the SPR data. The excellent agreement between therank ordering of the variants as determined by SPR and AlphaScreen™assay support the accuracy of the data.

TABLE 3 CD30 Binding of AC10 Variants AC10 SPR SPR AlphaScreenAlphaScreen Variant KD (nM) Fold KD IC50 (nM) Fold IC50 H2L1 9.49 0.3655.1 0.06 H2L2 5.95 0.57 49.2 0.06 H1L2 7.55 0.45 45.3 0.07 H1L1 5.630.60 27.7 0.11 H2L3 6.75 0.50 27.2 0.12 H2L0 8.00 0.42 19.4 0.16 H1L35.09 0.67 17.4 0.18 H1L0 6.39 0.53 9.77 0.32 H0L2 3.48 0.97 7.81 0.41H3L2 2.86 1.19 6.57 0.48 H3L0 3.08 1.10 6.18 0.51 H3L1 2.44 1.39 6.090.52 H0L1 3.29 1.03 5.19 0.61 H0L3 3.00 1.13 4.61 0.69 H0L0 3.39 1.003.18 1.00 H3L3 2.33 1.45 1.99 1.59

In addition to assessing the antigen affinity and biophysical propertiesof the variants of the present invention, they may also be tested foreffector functions in the context of a full length antibody. Oneadvantage of generating multiple reduced immunogenicity variants of aparent immunoglobulin is that it enables a greater degree of sequencediversity to be sampled, diversity which may provide optimal properties.Some sequences may have more optimal properties than others, for examplewith regard to effector function. For example, as disclosed in U.S. Ser.No. 60/614,944, and U.S. Ser. No. 60/619,409, filed Oct. 14, 2004,entitled “Immunoglobulin Variants Outside the Fc Region with OptimizedEffector Function”, the variable region of an antibody may impacteffector functions such as antibody dependent cell-mediated cytotoxicity(ADCC), antibody dependent cell-mediated phagocytosis (ADCP), andcomplement dependent cytotoxicity (CDC).

In order to explore any differences in capacity to mediate effectorfunction, the affinities of the AC10 variants for FcγRIIIa were measuredusing the AlphaScreen™ assay. The extracellular region of human V158FcγRIIIa was obtained by PCR from a clone obtained from the MammalianGene Collection (MGC:22630), and the receptor was fused with glutathioneS-Transferase (GST) to enable screening. Tagged FcγRIIIa was transfectedin 293T cells, and media containing secreted FcγRIIIa were harvested andpurified. The AlphaScreen™ assay was applied as a competition assay forscreening AC10 variants for binding to FcγRIIIa. Biotinylated WT AC10antibody was bound to streptavidin donor beads (Perkin Elmer), andGST-fused human V158 FcγRIIIa was bound to anti-GST acceptor beads(Perkin Elmer). The binding data are shown in FIGS. 16 a and 16 b, andthe resulting IC50's and Fold IC50's relative to WT are provided inTable 4. FcγRIIIa affinity of the AC10 variants was also measured usingSPR. GST-fused human FcγRIIIa (V158 isoform) was immobilized on a chip,and WT and variant AC10 antibodies were flowed over the chip at a rangeof concentrations. Binding constants were obtained from fitting the datausing standard curve-fitting methods. The equilibrium dissociationconstants (KDs) obtained from the fits to these binding curves, and thecalculated fold improvement or reduction relative to WT (Fold KD) areshown in Table 4.

TABLE 4 FcγRIIIa Binding of AC10 Variants AC10 SPR SPR AlphaScreenAlphaScreen Variant KD (nM) Fold KD IC50 (nM) Fold IC50 H2L1 14.9 1.25751 0.12 H2L2 4.01 4.64 146 0.60 H1L2 1.6.6 1.12 340 0.26 H1L1 11.2 1.66221 0.39 H2L3 3.52 5.28 183 0.48 H2L0 12.9 1.44 175 0.50 H1L3 11.2 1.66178 0.49 H1L0 22.0 0.85 71.6 1.22 H0L2 9.09 2.05 93.8 0.93 H3L2 3.575.21 88.7 0.98 H3L0 20.0 0.93 216 0.40 H3L1 17.4 1.07 209 0.42 H0L1 11.61.60 183 0.48 H0L3 12.7 1.46 146 0.60 H0L0 18.6 1.00 87.2 1.00 H3L3 6.133.03 83.5 1.04

To assess the capacity of the AC10 variants to mediate effector functionagainst CD30 expressing cells, the AC10 variants were tested in acell-based ADCC assay. Human peripheral blood monocytes (PBMCs) wereisolated from buffy-coat and used as effector cells, and CD30 positiveL540 Hodgkin's lymphoma cells were used as target cells. L540 targetcells were seeded at 20,000 per well in 96-well plates and treated withdesignated antibodies in triplicates starting at 1 μg/ml and in reducedconcentrations in ½ log steps. PBMCs isolated using a Ficoll gradientand allotyped as FcγRIIIa 158 V/F were added at 25-fold excess of L540cells and co-cultured for 4 hrs before processing for LDH activity usingthe Cytotoxicity Detection Kit (LDH, Roche Diagnostic Corporation,Indianapolis, Ind.) according to the manufacturer's instructions. Theplates were read using a Wallac 1420 Victor²™. FIGS. 17 a-17 c show theresults. The graphs show that the antibodies differ not only in theirEC50, reflecting their relative potency, but also in the maximal levelof ADCC attainable by the antibodies at saturating concentrations,reflecting their relative efficacy. These two terms, potency andefficacy, are sometimes used loosely to refer to desired clinicalproperties. In the current experimental context, however, they aredenoted as specific quantities, and therefore are here explicitlydefined. By “potency” as used in the current experimental context ismeant the EC50 of an EGFR targeting protein. By “efficacy” as used inthe current experimental context is meant the maximal possible effectorfunction of an antibody at saturating levels. Differences in capacity tomediate ADCC may be due to differences in antigen affinity, differentcapacities of the variant variable regions to effect FcγR binding, orboth. Regardless, the contribution of an antibody variable region toFcγR binding and effector function may be an important parameter forselecting a clinical candidate. The choice of an antibody clinicalcandidate based in whole or in part on the impact on effector functionof the variable region represents a novel dimension in antibodytherapeutics.

Based on the CD30 binding, FcγRIIIa binding, and ADCC results, the H3/L3AC10 variant was chosen as a potential biotherapeutic candidate. Becausethis antibody is intended for clinical use as an anti-cancertherapeutic, it may be advantageous to optimize its effector function.As previously described, substitutions can be engineered in the constantregion of an antibody to provide favorable clinical properties. In amost preferred embodiment, one or more amino acid modifications thatprovide optimized binding to FcγRs and/or enhanced effector functiondescribed in U.S. Ser. No. 10/672,280, PCT US03/30249, and U.S. Ser. No.10/822,231, and U.S. Ser. No. 60/627,774, filed Nov. 12, 2004 andentitled “Optimized Fc Variants”, are combined with the AC10 variants ofthe present invention. A number of optimized Fc variants obtained fromthese studies, including I332E, S239D, V264I/I332E, S239D/I332E, andS239D/A330L/I332E, were constructed in the H0/L0 and H3/L0 AC10antibodies using quick change mutagenesis (Stratagene). Antibodies wereexpressed and purified as described above. FIGS. 18 a and 18 b show theresults of the ADCC assay, carried out as described above, comparing WT(H0/L0) and H3/L3 AC10 in combination with the optimized Fc variants.Considerable enhancements in potency and efficacy are observed for theFc variant antibodies as compared to H0/L0 and H3/L3 AC10.

As described above, because variant sequences of the invention arepreferably derived from a HSC-increasing procedure in which substitutionof structurally important positions is disallowed (or discouraged), itis likely that additional optimization of HSC is possible if thosepositions are allowed to vary in a secondary analysis. It is noted that,due to residue masking, mutations in the variants occur distal to theCDRs and VL/VH interface. This is in contrast to CDR grafted antibodies,which have mutations in the parent that are at or near these criticalregions and thus have a significantly greater potential for perturbingantigen affinity. This is corroborated by the fact that CDR graftedantibodies typically require backmutations to the donor sequence toregain WT affinity for antigen. Such backmutations are usually made outof structural and immunogenic context with respect to host sequences,and cause dramatic reductions in the host string content of the finalvariant. In contrast, the variants presented herein are simultaneouslyoptimized for host string and structural fitness within the samecontext, and no backmutations need be made. Nonetheless, one or moresubsequent substitutions may be explored to increase antigen affinity orfurther improve HSC, for example by mutating residues that were maskedin the calculations and/or residues in or close to the CDRs or VL/VHinterface. Thus the H3/L3 variant can be thought of as a primary variantor template for further optimization, and variants of H3/L3 can bethought of as secondary variants. In contrast to backmutating as withCDR grafted antibodies, secondary substitutions in the variants of thepresent invention will comprise forward or neutral mutations withrespect to the host germline, and thus are expected to only improve orunaffected HSC. An additional benefit of generating secondary variantsis that, by exploring quality structural and string diversity, it isalso possible that other properties can be optimized, for exampleaffinity, activity, specificity, solubility, expression level, andeffector function.

String analysis was carried out on the H3/L3 sequence to design a set ofsecondary substitutions that have neutral, positive, or minimal impacton HSC, and/or that have significant potential for optimization ofantigen affinity and/or effector function. Table 5 provides this set of70 VL (Table 5a) and 64 VH (Table 5b) single mutations. The H3 columnprovides the WT H3 amino acid, and the Sub column provides the designedsubstitution. Positions are numbered according to the Kabat numberingformat, with Kabat CDR positions bolded. The provided string impact,defined according to Equation 7, describes the difference in HSC betweenthe primary variant sequence, here H3/L3, and the secondary variantsequence.

TABLE 5a L3 AC10 Secondary Variants Pos String Fold Fold Fold Variant(Kabat) L3 Sub Impact Prot A CD30 FcγRIIIa L3.1  1 D A 0 0.89 1.30 0.96L3.2  1 D E 1 1.09 1.24 1.46 L3.3  1 D N 0 1.24 1.66 1.35 L3.4  1 D S 00.97 1.04 1.18 L3.5  3 V Q 0 1.13 1.32 1.32 L3.6  4 L M 4 1.65 1.64 1.21L3.7 25 A S 6 L3.8  27a S D 0 1.02 0.94 0.61 L3.9  27b V I 1 1.04 0.580.81 L3.10  27c D L 5 L3.11  27c D S 8 L3.12  27c D V 3 1.24 1.15 1.19L3.13  27d F D 0 1.07 0.32 0.98 L3.14  27d F H 4 0.86 0.08 0.93 L3.15 27d F Y 5 1.04 0.63 1.19 L3.16 28 D N −1 1.10 1.61 1.18 L3.17 30 D K 41.01 1.21 1.24 L3.18 30 D N 6 1.09 1.45 0.94 L3.19 30 D S 4 1.07 1.130.82 L3.20 30 D Y 0 0.82 0.78 0.73 L3.21 31 S D 0 1.01 0.81 0.95 L3.2231 S T 3 1.03 0.46 0.97 L3.23 31 S N −1 1.03 0.71 1.00 L3.24 32 Y D 01.31 0.46 1.33 L3.25 33 M L 8 1.38 1.36 1.37 L3.26 34 N S 0 L3.27 34 N A−1 1.39 0.38 1.36 L3.28 34 N D −6 1.19 0.41 1.76 L3.29 46 V H 4 0.060.03 0.11 L3.30 46 V L 9 0.86 0.39 0.75 L3.31 46 V R 4 L3.32 46 V S 41.05 0.32 0.90 L3.33 50 A D 6 0.98 0.26 0.66 L3.34 50 A S 5 1.01 0.471.20 L3.35 50 A W 2 L3.36 53 N S 8 L3.37 53 N T 5 0.99 1.26 1.01 L3.3854 L R 1 1.19 1.46 1.55 L3.39 55 E A 2 1.01 0.85 1.32 L3.40 55 E Q 60.99 0.87 1.07 L3.41 56 S T 8 1.50 1.80 1.23 L3.42 58 I V 4 1.44 1.550.95 L3.43 60 A D 1 1.11 1.16 1.08 L3.44 60 A S 2 0.82 1.08 0.85 L3.4567 S P 0 L3.46 89 Q H 1 1.37 0.08 1.64 L3.47 91 S A 8 L3.48 91 S G 90.85 0.29 0.80 L3.49 91 S H 2 1.20 0.01 1.32 L3.50 91 S L 8 1.10 0.021.59 L3.51 91 S Y 8 1.00 0.02 1.50 L3.52 92 N I 3 L3.53 92 N S 2 3.020.48 1.34 L3.54 92 N Y 8 1.39 0.96 1.05 L3.55 93 E K 8 0.62 0.27 0.49L3.56 93 E N 8 1.06 0.64 0.84 L3.57 93 E Q 2 L3.58 93 E S 8 0.90 0.490.87 L3.59 94 D A 3 1.16 0.09 1.14 L3.60 94 D F 9 1.22 0.02 1.19 L3.6194 D H 8 L3.62 94 D L 3 0.87 0.46 0.79 L3.63 94 D S 1 1.74 0.57 1.42L3.64 94 D T 7 1.24 0.14 1.16 L3.65 96 W F 0.33 0.34 0.29 L3.66 96 W I0.75 0.00 0.57 L3.67 96 W L L3.68 96 W Y L3.69 100  G P L3.70 100  G Q

TABLE 5b H3 AC10 Secondary Variants Position String Fold Fold Variant(Kabat) H3 Sub Impact Prot A CD30 H3.1  1 Q E −1 0.83 1.00 H3.2  2 I L 01.60 2.76 H3.3  2 I M 2 0.88 0.68 H3.4  2 I V 0 0.98 1.28 H3.5  9 P A 20.95 1.29 H3.6 16 A T 2 0.89 1.13 H3.7 24 A V 2 1.54 4.45 H3.8 31 D G 20.80 1.40 H3.9 31 D S 2 0.82 1.65 H3.10 33 Y D 2 0.68 0.07 H3.11 33 Y G3 0.96 0.73 H3.12 33 Y W 0 0.84 0.00 H3.13 34 I L 1 0.96 1.52 H3.14 34 IM 8 1.05 1.62 H3.15 35 T D 1 1.55 0.05 H3.16 35 T G 2 1.03 0.15 H3.17 35T H 8 0.86 0.04 H3.18 35 T N 4 1.07 0.13 H3.19 35 T S 6 0.88 1.11 H3.2044 G A 0 1.20 2.04 H3.21 44 G R 0 1.36 2.60 H3.22 50 W I 6 1.25 0.01H3.23 50 W R 0 0.99 0.16 H3.24 52 Y N 2 1.03 0.03 H3.25 52 Y T 1 1.110.06 H3.26 52 Y V 1 1.33 0.06 H3.27  52a P A 1 1.02 2.00 H3.28  52a P V1 1.44 1.34 H3.29 54 S D 1 1.45 1.81 H3.30 54 S N 5 1.13 1.45 H3.31 58 KG 4 H3.32 58 K I 2 1.22 1.09 H3.33 58 K N 5 1.26 0.50 H3.34 60 N A 70.87 1.30 H3.35 60 N P 0 H3.36 60 N S 7 1.02 1.24 H3.37 60 N T 0 1.121.01 H3.38 60 N V 0 1.16 1.14 H3.39 60 N D 0 1.09 1.00 H3.40 61 E Q 71.51 1.83 H3.41 64 Q T 0 0.98 1.38 H3.42 71 V L 4 1.10 0.66 H3.43 71 V M9 1.17 0.88 H3.44 71 V R 1 1.25 1.76 H3.45 87 T M −1 0.99 1.14 H3.46 89V M −4 1.41 1.39 H3.47 91 F H 2 H3.48 91 F Y 9 1.32 1.60 H3.49 93 A T 01.47 0.38 H3.50 93 A V 0 1.01 1.40 H3.51 94 N A 9 1.51 0.08 H3.52 94 N H5 1.23 1.24 H3.53 94 N K 9 1.67 0.02 H3.54 94 N R 9 1.26 0.00 H3.55 94 NT 9 1.24 0.91 H3.56 99 W Y 1.26 0.07 H3.57 101  A D 1.31 0.53 H3.58 101 A Q 1.17 0.16 H3.59 102  Y H 1.69 1.05 H3.60 102  Y S 1.04 0.82 H3.61102  Y V 1.33 1.21 H3.62 102  Y L 1.34 1.22 H3.63 102  Y F 1.18 1.24H3.64 105  Q R 1.15 1.28

The secondary H3/L3 variants were constructed using quick changemutagenesis, and the full length antibodies were expressed and purifiedas described above. H3 variants comprised H3 variant VH chains(H3.1-H3.64) in combination with L3 VL, and L3 variants comprised L3variant VL chains (L3.1-L3.70) in combination with H3 VH. TheAlphaScreen™ assay was used to measure binding of the H3/L3 secondaryvariants to CD30 and FcγRIIIa (as described earlier), as well as toprotein A using biotinylated AC10 bound directly to protein A acceptorbeads and streptavidin donor beads. FIG. 19 provides AlphaScreen™binding curves for binding of select AC10 variants to CD30. The FoldIC50's relative to WT H3/L3 for binding to CD30, FcγRIIIa, and protein Aare provided in Table 5. A number of H3/L3 secondary variants providecomparable or improved binding to CD30 antigen relative to the H3/L3parent, enabling the engineering of additional variants that comprisecombinations of these substitutions, which may provide furtherenhancements in HSC and/or antigen affinity.

Secondary substitutions that show favorable properties with respect toantigen affinity, effector function, stability, solubility, expression,and the like, may be combined in subsequent variants to generate a moreoptimized therapeutic candidate. Two new VL and three new VH variantswere designed that comprise combinations of the described secondarysubstitutions, referred to as L3.71, L3.72, H3.68, H3.69, and H3.70.FIGS. 20-24 present the sequences_(SEQ ID NOS: 224-228), host stringcontent, and mapped mutational differences on the modeled AC10 structurefor each of these new AC10 VL and VH variants. Table 6 presents thenumber of mutations from the parent sequence, structural fitness scores,host string scores, and homology scores for these AC10 VL and VHvariants.

TABLE 6 AC10 Variants L3.71 L3.72 H3.68 H3.69 H3.70 Mutations 15 15 2327 30 Structural Consensus 0.56 0.55 0.46 0.46 0.45 StructuralPrecedence 0.54 0.52 0.55 0.57 0.56 Human String Content 0.88 0.87 0.800.83 0.84 Human String 0.52 0.45 0.39 0.47 0.47 Similarity FrameworkRegion 0.47 0.51 0.33 0.40 0.42 Homogeneity N₉max 55 47 46 55 55 N₈max19 24 26 26 34 N₇max 14 17 24 20 12 N_(≦6)max 19 19 23 18 18 ClosestGermline 4-1 4-1 7-4-1 1-3 1-3 ID to Closest 76/101 76/101 74/98 77/9879/98 Germline 75% 75% 76% 79% 81%

Because the provided AC10 variants antibodies are clinical candidatesfor anti-cancer therapeutics, it may be advantageous to optimize theireffector function. As previously described, substitutions can beengineered in the constant region of an antibody to provide favorableclinical properties. Combinations of the variants of the presentinvention with Fc modifications that alter effector function areanticipated. In a most preferred embodiment, one or more amino acidmodifications that provide optimized binding to FcγRs and/or enhancedeffector function described in U.S. Ser. No. 10/672,280, PCT US03/30249,and U.S. Ser. No. 10/822,231, and U.S. Ser. No. 60/627,774, filed Nov.12, 2004 and entitled “Optimized Fc Variants”, are combined with theAC10 variants of the present invention. The optimal anti-CD30 clinicalcandidate may comprise amino acid modifications that reduceimmunogenicity and enhance effector function relative to a parentanti-CD30 antibody. FIGS. 25 a-25 c (SEQ ID NOS:229-231) provide thelight and heavy chain sequences of AC10 variants that compriseL3.71/H3.70 AC10 as described above, combined with a number of possiblevariant IgG1 constant regions, comprising one or more modifications atS239, V264, A330, and I332, that provide enhanced effector function.

Although human IgG1 is the most commonly used constant region fortherapeutic antibodies, other embodiments may utilize constant regionsor variants thereof of other IgG immunoglobulin chains. Effectorfunctions such as ADCC, ADCP, CDC, and serum half-life differsignificantly between the different classes of antibodies, including forexample human IgG1, IgG2, IgG3, IgG4, IgA1, IgA2, IgD, IgE, IgG, and IgM(Michaelsen et al., 1992, Molecular Immunology, 29(3): 319-326). Anumber of studies have explored IgG1, IgG2, IgG3, and IgG4 variants inorder to investigate the determinants of the effector functiondifferences between them. See for example Canfield & Morrison, 1991, J.Exp. Med. 173: 1483-1491; Chappel et al., 1991, Proc. Natl. Acad. Sci.USA 88(20): 9036-9040; Chappel et al., 1993, Journal of BiologicalChemistry 268:25124-25131; Tao et al., 1991, J. Exp. Med. 173:1025-1028; Tao et al., 1993, J. Exp. Med. 178: 661-667; Redpath et al.,1998, Human Immunology, 59, 720-727. Using methods known in the art, itis possible to determine corresponding or equivalent residues inproteins that have significant sequence or structural homology with eachother. By the same token, it is possible to use such methods to engineeramino acid modifications in an antibody or Fc fusion that compriseconstant regions from other immunoglobulin classes, for example asdescribed in U.S. Ser. No. 60/621,387 and 60/629,068, to provide optimalproperties. As an example, the relatively poor effector function of IgG2may be improved by replacing key FcγR binding residues with thecorresponding amino acids in an IgG with better effector function, forexample IgG1. For example, key residue differences between IgG2 and IgG1with respect to FcγR binding may include P233, V234, A235, -236(referring to a deletion in IgG2 relative to IgG1), and G327. Thus oneor more amino acid modifications in the parent IgG2 wherein one or moreof these residues is replaced with the corresponding IgG1 amino acids,P233E, V234L, A235L, -236G (referring to an insertion of a glycine atposition 236), and G327A, may provide enhanced effector function.Furthermore, one or more additional amino acid modifications, forexample the S239D, V264I, A330L, I332E, or combinations thereof asdescribed above, may provide enhanced FcγR binding and effector functionrelative to the parent IgG2. FIGS. 25 a (SEQ ID NO:229), 25 d (SEQ IDNO:232), and 25 e (SEQ ID NO:233) illustrate this embodiment, providingthe light and heavy chain sequences of AC10 variants that compriseL3.71/H3.70 AC10 combined with a number of possible variant IgG2constant regions.

The Fc modifications defined in FIG. 25 that provide enhanced effectorfunction are not meant to constrain the invention to only thesemodifications for effector function optimization. For example, asdescribed in U.S. Pat. No. 6,737,056, PCT US2004/000643, U.S. Ser. No.10/370,749, and PCT/US2004/005112, the substitutions S298A, S298D,K326E, K326D, E333A, K334A, and P396L provide optimized FcγR bindingand/or enhanced ADCC. Furthermore, as disclosed in Idusogie et al.,2001, J. Immunology 166:2571-2572, substitutions K326W, K326Y, and E333Sprovide enhanced binding to the complement protein C1q and enhanced CDC.As described in Hinton et al., 2004, J. Biol. Chem. 279(8): 6213-6216,substitutions T250Q, T250E, M428L, and M428F provide enhanced binding toFcRn and improved pharmacokinetics. Modifications need not be restrictedto the Fc region. It is also possible that the mutational differences inthe Fab and hinge regions may provide optimized FcγR and/or C1q bindingand/or effector function. For example, as disclosed in U.S. Ser. No.60/614,944, and U.S. Ser. No. 60/619,409, filed Oct. 14, 2004, entitled“Immunoglobulin Variants Outside the Fc Region with Optimized EffectorFunction”, the Fab and hinge regions of an antibody may impact effectorfunctions such as antibody dependent cell-mediated cytotoxicity (ADCC),antibody dependent cell-mediated phagocytosis (ADCP), and complementdependent cytotoxicity (CDC). Thus immunoglobulin variants comprisingsubstitutions in the Fc, Fab, and/or hinge regions are contemplated. Forexample, the antibodies may be combined with one or more substitutionsin the VL, CL, VH, CH1, and/or hinge regions. Furthermore, furthermodifications may be made in non IgG1 immunoglobulins to correspondingamino acids in other immunoglobulin classes to provide more optimalproperties, as described in U.S. Ser. No. 60/621,387, filed Oct. 21,2004, entitled “IgG Immunoglobulin Variants with Optimized EffectorFunction”. For example, in one embodiment, an IgG2 antibody, similar tothe antibody presented in FIG. 25, may comprise one or moremodifications to corresponding amino acids in IgG1 or IgG3 CH1, hinge,CH2, and/or CH3. In another embodiment, an IgG2 antibody, similar to theantibody presented in FIG. 25, may comprise all of the IgG1 CH1 andhinge substitutions,. i.e. said IgG2 variant comprises the entire CH1domain and hinge of IgG1.

Example 2 Immunogenicity Reduction of C225

To illustrate further application of the method described in the presentinvention, and to validate its broad applicability to immunogenicityreduction of proteins, a second xenogeneic antibody example is providedusing the variable region of C225 as the parent sequence (cetuximab,Erbitux®, Imclone®) (U.S. Pat. No. 4,943,533; PCT WO 96/40210). C225 isa murine anti-EGFR antibody, a chimeric version of which is currentlyapproved for the treatment of cancer. A structural model of the murineC225 variable region was constructed using standard antibody modelingmethods known in the art. FIGS. 26 and 27 show the sequences (SEQ IDNOS:234- and 235), host string content, and structures of the C225 VLand VH domains. A CDR graft of this antibody was constructed by placingthe C225 CDRs into the context of the frameworks of the most homologoushuman germlines, determined to be vlk_(—)6D-21 for VL and vh_(—)4-30-4for VH using the sequence alignment program BLAST. The sequences (SEQ IDNOS:236 and 237) and string content of these CDR grafts are shown inFIGS. 28 and 29, along with structures of modeled C225 highlighting themutational differences between the CDR grafted C225 variable chains andWT.

Variants with reduced immunogenicity were generated by applying a stringoptimization algorithm on the WT C225 VL and VH sequences, similar to asdescribed above for AC10 except that single instead of multiple aminoacid substitutions were sampled. HSC of each sequence was optimizedusing a window size w=9, and the same set of CDR and VL/VH interfaceproximal residues were masked. The calculation was run for C225 VL andVH in 100 separate iterations, generating a set of diverse C225 variantswith more host string content than WT. FIG. 30 shows the nonredundantset of output sequences from these calculations for the C225 VL and VHregions (SEQ ID NOS:238-274), referred to as C225 VL HSC Calculation 1and C225 VH HSC Calculation 1, respectively. In addition to the HSCscore, the structural consensus and structural precedence of eachsequence was evaluated (U.S. Ser. No. 60/528,229, filed Dec. 8, 2003,entitled Protein Engineering with Analogous Contact Environments) inorder to evaluate its structural integrity.

A second set of similar calculations were run on the C225 VL and VHsequences, except that the algorithm was allowed to sample multipleamino acid substitutions, rather than only single substitutions, inorder to optimize HSC. FIG. 31 shows the nonredundant set of outputsequences from these calculations for the C225 VL and VH regions (SEQ IDNOS:275-378), referred to as C225 VL HSC Calculation 2 and C225 VH HSCCalculation 2, respectively. Here, two measure of structural fitness,referred to as “Structural Consensus” and “Structural Precedence” (U.S.Ser. No. 60/528,229 and U.S. Ser. No. 60/602,566), are used to evaluatethe structural and functional integrity of the sequences, in addition toHSC score. The output sequences were clustered based on their mutationaldistance from the other sequences in the set, and these clusters aredelineated by the horizontal black lines in the Figure.

The calculations described above and presented in FIGS. 30 and 31 wereused to generate a set of C225 VL and VH variants (SEQ ID NOS:238-378).In some cases, further substitutions were made to output sequences,using string and structural scores, as well as visual inspection of themodeled C225 structure, to evaluate fitness. FIGS. 32-40 present thesequences (SEQ ID NOS:379-387), structural scores, string scores, andmapped mutational differences on the modeled C225 structure for each ofthe C225 VL and VH variants. Iteration 21 from C225 VL HSC calculation 1served as the precursor for L2 C225 VL, iteration 17 from C225 VL HSCcalculation 2 served as the precursor for L3 C225 VL, and iteration 38from C225 VL HSC calculation 2 served as the precursor for L4 C225 VL.Iteration 23 from C225 VH HSC calculation 1 served as the precursor forH3, H4, and H5 C225 VH, iteration 5 from C225 VH HSC calculation 2served as the precursor for H6 C225 VH, iteration 41 from C225 VH HSCcalculation 2 served as the precursor for H7 C225 VH, and iteration 44from C225 VH HSC calculation 2 served as the precursor for H8 C225 VH.

Tables 7 and 8 present the mutational, structural fitness, and hoststring content scores for the C225 VL and VH variants as compared to theWT and CDR grafted C225 sequences. In addition, the maximum identitymatch to the germline for each string in the sequences was alsodetermined, referred to as N_(ID)max. This represents the total numberof strings in each sequence whose maximum identity to the correspondingstrings in the human germline is the indicated value. For w=9, Tables 7and 8 list N₉max, N₈max, N₇max, and N_(≦6)max for each sequence. Alsoprovided is the framework region homogeneity. In addition to theaforementioned structural and host string analysis, each sequence wasanalyzed for its global homology to the human germline; tables 7 and 8present the most homologous human germline sequence for each sequence(Closest Germline) and corresponding identity to that germline (ID toClosest Germline), determined using the sequence alignment programBLAST.

TABLE 7 C225 VL Variants CDR WT Graft L2 L3 L4 Mutations 25 17 21 18Structural Consensus 0.49 0.52 0.56 0.58 0.54 Structural Precedence 0.530.56 0.57 0.59 0.57 Human String Content 0.79 0.94 0.91 0.92 0.91 HumanString 0.15 0.65 0.51 0.58 0.57 Similarity Framework Region 0.97 0.520.50 0.78 Homogeneity N₉max 13 69 52 60 58 N₈max 27 15 28 24 23 N₇max 3722 20 16 19 N_(≦6)max 30 1 7 7 7 Closest Germline 6D-21 6D-21 3-11 1D-136D-21 ID to Closest 63/95 87/95 72/95 73/94 79/95 Germline 66% 91% 75%77% 83%

TABLE 8 C225 VH Variants CDR WT Graft H3 H4 H5 H6 H7 H8 Mutations 33 1821 15 21 22 28 Structural Consensus 0.44 0.48 0.51 0.46 0.49 0.52 0.490.53 Structural Precedence 0.55 0.54 0.55 0.54 0.51 0.55 0.58 0.55 HumanString Content 0.67 0.84 0.79 0.81 0.77 0.79 0.79 0.79 Human String 0.040.56 0.36 0.41 0.33 0.36 0.35 0.33 Similarity Framework Region 0.97 0.450.52 0.50 0.50 0.76 0.77 Homogeneity N₉max 3 66 42 48 38 42 41 39 N₈max23 17 25 24 17 24 21 27 N₇max 32 10 19 16 26 23 25 23 N_(≦6)max 61 26 3331 38 30 32 30 Closest Germline 4-30-4 4-30-4 4-30-4 2-26 4-30-4 3-334-30-4 3-33 ID to Closest 56/99 88/99 67/99 74/99 64/99 69/99 67/9980/99 Germline 56% 88% 66% 74% 64% 70% 67% 81%

Again, whereas the CDR grafted C225 antibodies are most homologous to asingle human germline sequence, the C225 variants of the presentinvention are homologous to different human germline sequences indifferent regions of the sequence. Whereas CDR grafted C225 VH is mosthomologous to human germline subfamily 4 across its entire sequence, H4C225 VH is most homologous to subfamily 4 in FR1, subfamily 3 in FR2,and subfamily 2 in FR3. Additionally, whereas the CDR grafted antibodiesare most homologous to a single germline sequence that is also the mosthomologous sequence to the parent sequence, the present inventionpresents a set of antibodies for a given antibody that are mosthomologous to different human germline sequences, which need not be themost homologous germline sequence to WT. For example, Table 7 shows thatCDR grafted C225 VL is most homologous to vlk_(—)6D-21, which is alsothe most homologous human germline to WT C225. However L2, L3, and L4are most homologous to three different human germlines—ylk_(—)3-11,ylk_(—)1D-13, and vlk_(—)6D-21 respectively. Thus the variants of thepresent invention explore a substantially greater amount of diversitythan CDR grafted antibodies.

The genes for the C225 variable regions were constructed as describedabove, and subcloned into a modified pASK84 vector (Skerra, 1994, Gene141: 79-84) comprising mouse constant regions for expression as Fabs.Select C225 variants were experimentally tested for their capacity tobind EGFR antigen. L2/H3 and L2/H4 C225 Fabs were expressed from thepASK84 vector in E. Coli with a His-tag, and purified usingNickel-affinity chromatography. Antigen affinity of the C225 variantswas tested using SPR similar to as described above. EGFR extracellulardomain (purchased commercially from R&D Systems) was covalently coupledto the dextrane matrix of a CM5 chip using NHS-linkage chemistry. C225Fabs were reacted with the EGFR sensor chip surface at varyingconcentrations. Global Langmuir fits were been carried out for theconcentrations series using the BiaEvaluation curve fitting software.The on-rate constant (ka), off-rate constant (kd), equilibrium bindingconstant (KD=kd/ka), and predicted saturation binding signal (Rmax)derived from these fits are presented in Table 9, along with the Chi2which quantifies the average deviation of the fit curve from the actualdata curve. The data indicate that both the L2/H3 and L2/H4 C225variants bind EGFR antigen.

TABLE 9 SPR data on C225 Variants Rmax C225 ka (1/Ms) kd (1/s) KD (M)(RU) Chi2 L2/H3 2.79 × 10⁴ 5.35 × 10⁻³ 1.92 × 10⁻⁷ 174 8.83 L2/H4 1.79 ×10⁴ 4.73 × 10⁻³ 2.64 × 10⁻⁷ 153 2.69

In order to investigate the anti-EGFR variants in the context of a fulllength antibody, the C225 WT (L0 and H0) and variant (L2, L3, L4, H3,H4, H5, H6, H7, and H8) regions were subcloned into the mammalianexpression vector pcDNA3.1Zeo (Invitrogen) as described above. Allcombinations of the light and heavy chain plasmids were co-transfectedinto 293T cells, and antibodies were expressed, harvested, and purifiedas described above. Binding of the C225 WT (L0/H0) and variant (L0/H3,L0/H4, L0/H5, L0/H6, L0/H7, L0/H8, L2/H3, L2/H4, L2/H5, L2/H6, L2/H7,L2/H8, L3/H3, L3/H4, L3/H5, L3/H6, L3/H7, L3/H8, L4/H3, L4/H4, L4/H5,L4/H6, L4/H7, and L4/H8) antibodies was determined using SPR similar toas described above. Full length antibodies were flowed over the EGFRsensor chip described above. FIG. 41 shows the SPR sensorgrams obtainedfrom the experiments. The curves consist of a association phase anddissociation phase, the separation being marked by a little spike oneach curve. As a very rough approximation the signal level reached nearthe end of the association phase can be used as an indicator forrelative binding. For all the curves this signal level is within 25% ofthe average level indicating that none of the antibody variants havesignificantly lost their ability to bind to EGFR.

To assess the capacity of the anti-EGFR antibodies to mediate effectorfunction against EGFR expressing cells, the C225 variants were tested ina cell-based ADCC assay. Human peripheral blood monocytes (PBMCs) wereused as effector cells, A431 epidermoid carcinoma cells were used astarget cells, and lysis was monitored by measuring LDH activity usingthe Cytotoxicity Detection Kit as described above. FIG. 42 shows thedose dependence of ADCC at various antibody concentrations for WT andvariant C225 antibodies. The results show that a number of the C225variants have comparable or better ADCC than WT C225 with respect topotency and efficacy. These data may be weighed together with theantigen affinity data and other data to choose the optimal anti-EGFRclinical candidate. As exemplified above with AC10 variants,combinations of the C225 variants of the present invention with aminoacid modifications that alter effector function are contemplated.

Example 3 Immunogenicity Reduction of ICR62

To further illustrate application of the method described in the presentinvention, and to validate its broad applicability to immunogenicityreduction of proteins, an example is provided using as the parentsequence the anti-EGFR antibody ICR62 (Institute of Cancer Research)(PCT WO 95/20045; Modjtahedi et al., 1993, J. Cell Biophys. 1993,22(1-3):129-46; Modjtahedi et al., 1993, Br J Cancer. 1993,67(2):247-53; Modjtahedi et al, 1996, Br J Cancer, 73(2):228-35;Modjtahedi et al, 2003, Int J Cancer, 105(2):273-80). A structural modelof the rat ICR62 variable region was constructed using standard antibodymodeling methods known in the art. FIGS. 43 and 44 show the sequences(SEQ ID NOS:388 and 389), host string content, and structures of theICR62 VL and VH domains. A CDR graft of this antibody was constructed byplacing the ICR62 CDRs into the context of the frameworks of the mosthomologous human germlines, determined to be vlk_(—)1-17 for VL andvh_(—)1-f for VH using the sequence alignment program BLAST. Thesequences and string content of these CDR grafts are shown in FIGS. 45and 46 (SEQ ID NOS:390 and 391), along with structures of modeled ICR62highlighting the mutational differences between the CDR grafted ICR62variable chains and WT.

Variants with reduced immunogenicity were generated by applying a stringoptimization algorithm on the WT ICR62 VL and VH sequences, similar toas described above for AC10 except that single instead of multiple aminoacid substitutions were sampled. HSC of each sequence was optimizedusing a window size w=9, and the same set of CDR and VL/VH interfaceproximal residues were masked. The calculation was run for ICR62 VL andVH in 100 separate interactions, generating a set of diverse ICR62variants with more host string content than WT. FIG. 47 shows thenonredundant set of output sequences (SEQ ID NOS:392 and 431) from thesecalculations for the ICR62 VL and VH regions, referred to as ICR62 VLHSC Calculation 1 and ICR62 VH HSC Calculation 1 respectively. Inaddition to the HSC score, the structural consensus and structuralprecedence of each sequence was evaluated (U.S. Ser. No. 60/528,229,filed Dec. 8, 2003, entitled Protein Engineering with Analogous ContactEnvironments) in order to evaluate its structural integrity.

The calculations described above and presented in FIG. 47 were used togenerate a set of ICR62 VL and VH variants (SEQ ID NOS:392 and 455). Insome cases, further substitutions were made to output sequences, usingHSC and Structural Precedence scores, as well as visual inspection ofthe modeled ICR62 structure, to evaluate fitness. FIGS. 48-50 presentthe sequences (SEQ ID NOS:456 and 458), host string content, and mappedmutational differences on the modeled ICR62 structure for each of theICR62 VL and VH variants. Iteration 20 from ICR62 VL HSC calculation 1served as the precursor for L2 ICR62 VL. Iteration 1 from ICR62 VH HSCcalculation 1 served as the precursor for H9, and iteration 5 from ICR62VH HSC calculation 2 served as the precursor for H10 ICR62 VH.

Tables 10 and 11 present the mutational, structural fitness, and hoststring content scores for the ICR62 VL and VH variants as compared tothe WT and CDR grafted ICR62 sequences. In addition, the maximumidentity match to the germline for each string in the sequences,referred to as N_(ID)max, is also provided, as well as the frameworkregion homogeneity. In addition to the aforementioned structural andhost string analysis, each sequence was analyzed for its global homologyto the host germline; tables 10 and 11 present the most homologous hostgermline sequence for each sequence (Closest Germline) and correspondingidentity to that germline (ID to Closest Germline), determined using thesequence alignment program BLAST.

TABLE 10 ICR62 VL Variants CDR WT Graft L2 Mutations 0 11 6 StructuralConsensus 0.56 0.60 0.61 Structural Precedence 0.52 0.58 0.57 HumanString Content 0.86 0.91 0.90 Human String 0.38 0.58 0.56 SimilarityFramework Region 0.62 0.97 0.64 Homogeneity N₉max 37 59 56 N₈max 26 2119 N₇max 31 18 22 N_(≦6)max 13 9 10 Closest Germline 1-17 1-17 1-17 IDto Closest 76/95 86/95 81/95 Germline 80% 90% 85%

TABLE 11 ICR62 VH Variants CDR WT Graft H9 H10 Mutations 0 34 20 21Structural Consensus 0.43 0.44 0.46 0.45 Structural Precedence 0.42 0.520.47 0.49 Human String Content 0.64 0.85 0.79 0.79 Human String 0.010.54 0.28 0.33 Similarity Framework Region 1.00 0.64 0.85 HomogeneityN₉max 1 64 33 39 N₈max 16 24 33 30 N₇max 35 14 28 25 N_(≦6)max 67 17 2525 Closest Germline 1-f 1-f 1-f 1-f ID to Closest 60/98 92/98 72/9877/98 Germline 61% 93% 73% 79%

Again, as observed from the significant differences in FRH and closestgermlines, the ICR62 variants are homologous to different host germlinesequences in different regions of the sequence. The genes for the ICR62WT and L2/H9 variable regions were constructed as described above, andsubcloned into a modified pASK84 vector (Skerra, 1994, Gene 141: 79-84).The ICR62 Fabs experimentally tested for their capacity to bind EGFRantigen. WT and L2/H9 ICR62 Fabs were expressed from the pASK84 vectorin E. Coli with a His-tag, and purified using Nickel-affinitychromatography. Antigen affinity of the ICR62 antibodies was testedusing SPR similar to as described above, with EGFR covalently coupled tothe CM5 chip reacted with ICR62 antibodies at varying concentrations.The fits to the data, as described above, are provided in Table 12. fitswere been carried out for the concentrations series using theBiaEvaluation curve fitting software. As can be seen, L2/H9 ICR62 bindswith comparable affinity as WT to the EGFR antigen.

TABLE 12 SPR data on ICR62 Variants Rmax ICR62 ka (1/Ms) kd (1/s) KD (M)(RU) Chi2 WT 9.86 × 10⁴ 2.53 × 10⁻⁵ 2.57 × 10⁻¹⁰ 402 1.86 L2/H9 2.35 ×10⁵ 1.06 × 10⁻⁴ 4.50 × 10⁻¹⁰ 508 4.91

Example 4 String Diversity Exploration of Immunoglobulins

The generation of mutational diversity based on HSC is much broader thanthe primary variant—secondary variant strategy described above for H3/L3AC10. Indeed substitutions can be designed for any parent proteinwherein the substitutions result in positive or neutral impact on thehost string content of the parent sequence. Again, the advantage of sucha strategy is that it generates a diverse set of minimally immunogenicvariants that have the potential for optimized properties, including butnot limited to antigen affinity, activity, specificity, solubility,expression level, and effector function. Such a set of variants may bedesigned, for example, to explore diversity for other parentimmunoglobulins, including but not limited to nonhuman antibodies,humanized or otherwise engineered antibodies (Clark, 2000, Immunol Today21:397-402), (Tsurushita & Vasquez, 2004, Humanization of MonoclonalAntibodies, Molecular Biology of B Cells, 533-545, Elsevier Science(USA)), and “fully human” antibodies, obtained for example usingtransgenic mice (Bruggemann et al., 1997, Curr Opin Biotechnol8:455-458) or human antibody libraries coupled with selection methods(Griffiths et al., 1998, Curr Opin Biotechnol 9:102-108).

Example 5 Unique Properties of Variant Proteins Generated by the Methodsof the Present Invention

The methods described in the present invention generate variant proteinsthat possess a number of unique properties relative to variant proteinsgenerated by other methods that attempt to achieve the same or similargoal. FIGS. 51 and 52 provide the host string content (HSC, Equation 3),exact string content (ESC, Equation 3a), and framework regionhomogeneity (FRH, Equation 10) of the AC10, C225, and ICR62 VH (FIG. 51)and VL (FIG. 52) variants of the present invention, compared with anumber of antibody variable regions “humanized” by methods in the priorart. If a variant sequence's exact string content is derived solely froma single germline sequence, the FRH would be close to 1.0.Alternatively, as is the case with many of the variant sequences createdby the present invention, FRH values can be significantly less than 1,with values ranging from 0.4 to 1.0, indicating, as expected, thatsequences with high exact string content can be discovered withcontributions from multiple germline subfamilies and sequences. At thesame time, the variant sequences engineered using the present inventionhave high host string content, and thus are predicted to have lowpotential for immunogenicity in humans. For example, as shown in FIG. 51variant VH sequences generated using the present invention have HSCvalues generally higher than 75%, and many of them have FRH values lowerthan 0.6, indicating their HSC is derived from multiple germlineframeworks. As shown in FIG. 52, similar trends apply for variant VLsequences generated using the present invention

Whereas particular embodiments of the invention have been describedabove for purposes of illustration, it will be appreciated by thoseskilled in the art that numerous variations of the details may be madewithout departing from the invention as described in the appendedclaims. All references are herein expressly incorporated by reference.

1. An antibody that binds the CD30 antigen, wherein said antibodycomprises: a) a heavy chain comprising a heavy chain variable region anda constant heavy region, wherein said variable heavy region is selectedfrom the group consisting of: i) the sequence of SEQ ID NO:223; ii) thesequence of SEQ ID NO:226; iii) the sequence of SEQ ID NO:227; iv) thesequence of SEQ ID NO:228; and b) a light chain comprising a lightvariable region and a light constant region, wherein said light variableregion is selected from the group consisting of: i) the sequence of SEQID NO:220; ii) the sequence of SEQ ID NO:224; and iii) the sequence ofSEQ ID NO:225; wherein the numbering of the variable region is accordingto the Kabat numbering system.
 2. The antibody according to claim 1,wherein said variable light chain is the sequence of SEQ ID NO:224. 3.The antibody according to claim 1 or 2, wherein said variable heavychain is the sequence of SEQ ID NO:227.
 4. The antibody according toclaim 3, wherein said heavy chain constant region comprises asubstitution at position selected from the group consisting of 239, and332, wherein number of the constant region is according to the EU index.5. The antibody according to claim 4, wherein said substitutioncomprises 332E.
 6. The antibody according to claim 4, wherein said heavychain constant region comprises 239D and 332E.
 7. An antibody that bindsthe CD30 antigen, wherein said antibody comprises: a) a heavy chainvariable region comprising: i) a first CDR comprising amino acids 31-35of SEQ ID NO:227; ii) a second CDR comprising amino acids 50-66 of SEQID NO:227; iii) a third CDR comprising amino acids 99-106 of SEQ IDNO:227; and b) a light chain variable region comprising: i) a first CDRcomprising amino acids 24-38 of SEQ ID NO:224; ii) a second CDRcomprising amino acids 54-60 of SEQ ID NO:224; iii) a third CDRcomprising amino acids 93-101 of SEQ ID NO:224; wherein the numbering ofthe variable region is according to the Kabat numbering system.
 8. Apolynucleotide encoding the heavy and light chains of a human monoclonalantibody that competes for binding to human CD30 with an antibodycomprising the heavy chain FR1 through FR4 amino acids sequences of SEQID NO:227 and the light chain FR1 through FR4 amino acid sequence of SEQID NO:224.
 9. An expression vector comprising the polynucleotide ofclaim
 7. 10. A mammalian host cell line comprising the polynucleotide ofclaim
 7. 11. A method of generating a variant protein for a host ascompared to a parent protein comprising: a) comparing said parentprotein sequence with two or more natural protein sequences from saidhost species; b) analyzing one or more amino acid strings of said parentprotein sequence with a corresponding amino acid string of each naturalprotein sequence; c) substituting one or more amino acids of said parentprotein sequence with a corresponding amino acid string of a naturalprotein sequence on an amino acid string by amino acid string basis; d)wherein said variant protein has increased host string content ascompared to said parent protein; and e) wherein said substituted aminoacids include a first substitution in a first string from a firstnatural protein, and a second substitution in a second string from asecond natural protein.